Sitecore xConnect – Cleaning up inactive xDB Contacts
Jason St-Cyr
Posted on December 23, 2019
Starting with the Sitecore 9.2 release, the xConnect API added new API functions to delete information from the xDB. This allows you to make sure your data is cleaner, more relevant, and not costing you unnecessary data storage fees.
As pointed out in the documentation, this is not the same as doing a call for the “right to be forgotten”. Also, there are some limitations in the current version. In my opinion, the most important limitation is that you cannot target a subset of interactions for a Contact, you have to remove all the interactions along with the contact.
Why you might need ‘partial’ deletes
As an example, you might have a frequestly visiting Contact that has a lot of interactions taking up space in the database, but only one or two really important interactions, perhaps purchases, that are still relevant. With the current API, that would mean you would have to either get rid of all of the interactions and the Contact, or keep all of it. You cannot just keep the Contact and the important interactions.
So if you have a lot of returning Contacts and just want to clear out their old history, you’ll need to opt for a fancier delete approach. For these ‘fancy’ deletes, you might need some PowerShell wizardry or SQL scripts to highly target different types of interactions and get your exact business rules on data retention implemented.
For this post, though I want us to look at how to use the xConnect API as it is today, clearing out a Contact and all of its interactions.
Prefer to watch? Video demo on Master Sitecore now available:
Executing a simple Delete
One use case is that you need to delete a specific Contact and all of its data, perhaps because of a sync triggered with an external system, or perhaps you have provided this functionality to the user to allow them to delete their profile.
I have updated my xConnect Tutorial repo with a DeleteContactTutorial class. This contains a version of the documentation tutorial so that you can debug and watch it in action. Below is a simplified version of what you can find in the GitHub repo:
//Get the existing contact that we want to delete
var existingContact = await GetContact(cfg, twitterId);
if(existingContact != null)
{
//Add the delete operation onto the client for the specified contact and execute
using (var client = new XConnectClient(cfg))
{
client.DeleteContact(existingContact);
await client.SubmitAsync();
}
}
Inside the code
Create a Contact definition. You can do this in numerous ways, though in this example I do a call to get the Contact from xConnect to validate it exists already before doing a delete. This is not performant at load, but might be helpful if you need to execute a check for a particular facet value or some other data before executing a delete.
Create a client. Before executing anything, we need an instance of the xConnect client to invoke.
Add the operation. We invoke ‘DeleteContact’ with the Contact definition of what we want to delete. This adds the operation to the client, but does not execute it right away.
Submit the operations. We ask the client to submit all operations to the client. In this case, we only have a single operation to execute. The ‘await’ ensures the current thread waits for the asynchronous call to return. I do this in the console app to allow for sequential processing and retrieving results, but you might want to execute completely asynchronously for performance.
What doesn’t get deleted immediately?
Deleting a Contact from xDB gets rid of a lot of stuff, but there are still some pieces that stick around:
Device profiles: These are stored in xdb_collection.DeviceProfiles and are not cleared out. If this table is getting out of hand, you might need a cleanup script that regularly trims out data you don’t want in here anymore.
xDB Index: The xDB Search Indexer will clean this up, but it is not directly executed by the API call so you need to wait for that job to run.
Marketing automation plans: Like the index, this isn’t directly invoked by the API call so you need to wait for the Marketing Automation Engine to process a high priority work item before the contact gets removed from automation plans.
Taking it to the next step – Cleaning up inactive Contacts
Now that we understand the concept of a delete, how do we use this for a more common scenario, like deleting old, irrelevant data? For example, you might have a lot of data being used up by Contacts that haven’t interacted with your various sites and applications in 3 months. Depending on your business requirements, it may be more valuable for you to clear out this data and save some storage costs rather than hope they come back and use this data at that time.
In the xConnect Tutorial repo I’ve added a new ‘Delete Multiple Contacts’ tutorial which generates a scenario of 5 contacts that have no interactions.
There are two key pieces to this scenario:
- Finding the Contacts that are inactive/expired
- Deleting all of those inactive contacts
Finding inactive contacts
The GetContactsByLastActivityTutorial class shows an example of searching by Contact activity (interactions). This searches against the index using a query to find all Contacts that do not have recent interactions since a provided date.
The important piece of logic is:
//Set up options to restrict the number of interactions we return for each contact
var contactExpandOptions = new ContactExpandOptions();
contactExpandOptions.Interactions = new RelatedInteractionsExpandOptions() { Limit = 20 };
//Create a queryable to search by the date
IAsyncQueryable<Contact> queryable = client.Contacts
.Where(c => (!c.Interactions.Any()) ||
!c.Interactions.Any(x => x.StartDateTime > searchStartTime))
.WithExpandOptions(contactExpandOptions);
//Invoke the query
var enumerator = await queryable.GetBatchEnumerator(10);
//Collect all the matching contact IDs
while(await enumerator.MoveNext())
{
foreach(var contact in enumerator.Current)
{
if(contact.Id.HasValue)
matchingContactIds.Add(contact.Id.Value);
}
}
Inside the code
Setup the query options to return interactions. We are querying the Contacts index, but we want to make sure that some interactions are returned as well when the query is done. The RelatedInteractionsExpandOptions allow us to specify how we want to pull in interactions. In this case, limiting it to 20 interactions being returned.
Create a query for the search. This is where we really add the restrictions (where clause) that will define what we want to return. In this example, I want to get any contacts that:
a. Have no interactions, or
b. Have no interactions that occurred more recently than the ‘start time’ providedExecute the queryable. This gets us an enumerator which can iterate over the search results in batches.
Collect results. In order to do anything with these search results and delete these contacts later, I need to collect the contact IDs from the enumerator. This means collecting everything in the current enumerator and then moving to the next batch of results.
Deleting inactive Contacts
The DeleteMultipleContacts tutorial shows one way to batch delete multiple contacts. The example here works with a list of Contact objects and deletes all of them.
The important chunk that is different from Single Delete example seen earlier:
//Add an operation to the task for each contact in the list
foreach(var contact in contacts)
{
client.DeleteContact(contact);
}
await client.SubmitAsync();
Inside the code
- Add all operations. Previously, we had just one Contact to delete so we just called DeleteContact on the client once. But now we need to do this for every Contact in the list to create the list of operations to execute.
- Execute the client. Exactly the same as a single delete, except in this case multiple operations will be performed.
A queryable to rule all things
The most important thing to take away here is that the Delete of a Contact record itself is very straightforward and simple, the complexity is actually in determining WHAT to delete. Building that search queryable to determine which Contacts should no longer exist is where you’ll need to put your time and testing!
Resources
Posted on December 23, 2019
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.