Deleting all blobs in an Azure Storage / Data Lake container quickly
Wai Liu
Posted on September 20, 2023
The Azure CLI contains a delete-batch command to delete multiple blobs at once.
az storage blob delete-batch --account-key <storage_account_key> --account-name <storage_account_name> --source <container_name>
This article explains this in detail.
However, this command is sloooooow. If you have 100k+ records, this could take hours.
Another way to do it is programmatically. Here is an Azure function that can be deployed to delete all your blobs much faster.
public static class DeleteAllFilesInContainer
{
[FunctionName("DeleteAllFilesInContainer")]
public static async Task<IActionResult> Run(
[HttpTrigger(AuthorizationLevel.Anonymous, "get", "post", Route = null)] HttpRequest req,
ILogger log)
{
string containerName = "mycontainer";
string directoryPath = $"mydirectory";
var client = GetDataLakeServiceClient("mydatalakeaccount", "<insert key>");
var fileSystemClient = client.GetFileSystemClient(containerName);
IAsyncEnumerator<PathItem> enumerator =
fileSystemClient.GetPathsAsync("message").GetAsyncEnumerator();
await enumerator.MoveNextAsync();
PathItem item = enumerator.Current;
var lstOfTasks = new List<Task>();
int i = 0;
while (item != null)
{
if (!await enumerator.MoveNextAsync())
{
break;
}
item = enumerator.Current;
lstOfTasks.Add(DeleteFile(item.Name, fileSystemClient));
i++;
}
await Task.WhenAll(lstOfTasks.ToArray());
string responseMessage = $"You deleted {i} files";
return new OkObjectResult(responseMessage);
}
public static DataLakeServiceClient GetDataLakeServiceClient(string accountName, string accountKey)
{
StorageSharedKeyCredential sharedKeyCredential =
new StorageSharedKeyCredential(accountName, accountKey);
string dfsUri = $"https://{accountName}.dfs.core.windows.net";
DataLakeServiceClient dataLakeServiceClient = new DataLakeServiceClient(
new Uri(dfsUri),
sharedKeyCredential);
return dataLakeServiceClient;
}
public async static Task DeleteFile(
string fileName, DataLakeFileSystemClient fileSystemClient)
{
await fileSystemClient.GetFileClient(fileName).DeleteAsync();
}
}
The important bits are below.
var lstOfTasks = new List<Task>();
Create a list of Tasks.
lstOfTasks.Add(DeleteFile(item.Name, fileSystemClient));
For each blob, create an asynchronous task to delete it
await Task.WhenAll(lstOfTasks.ToArray());
Wait until all the tasks to delete each of their files are finished.
This method will make an make a request for each file to be deleted and fire these requests all at the same time.
Method can be adjusted to delete only some blobs based on some sort of filter as needed.
In a container with 130k blobs it took less than 4 minutes to delete everything.
Posted on September 20, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.