CH S Sankalp jonna
Posted on April 4, 2021
Updating a bunch of database entries together can be tricky compared to updating a single row. This is because there are two things that need to be kept in mind.
- The operation needs to be atomic. You cannot have another process updating the same set of rows while running a bulk update on them.
- The operation needs to be efficient. Running multiple queries on the database for updating multiple rows should be avoided.
Another factor at play here is whether you want to update these entries with the same values or different ones.
For instance, you might want to mark a bunch of old rows in your database as deleted. In this case you need to update all the rows with is_deleted = true.
But what if you want to update each row differently based on a certain condition?
Suppose you have a python dictionary where keys are existing user ids and values are scores to be updated for those users.
The dictionary would look like this - {1: 150, 2: 200, 3:500 …….}. Each entry in the “User” table would have to be updated with their corresponding score without running multiple update queries.
There are a couple of ways to achieve a bulk update efficiently. We will go through each of them, but first let's start with the most inefficient way to do this.
The inefficient way
Let us take the example of a table representing cake recipes where you want to delete all recipes older than March 1st 2021.
The most layman way to achieve this is as follows:
from datetime import datetime
starting_date = datetime.strptime("2021-03-01", "%Y-%m-%d").date()
for obj in Recipe.objects.filter(created_date__lt=starting_date)
obj.is_deleted=True
obj.save()
This operation will not be atomic because while this loop is running, there could be other processes trying to update the same rows.
This operation will not be efficient either because it will make an update query to the database for every single iteration of this loop.
Therefore, using this method should be avoided at all costs.
The efficient way
The same thing can be achieved by making a single update operation to the database while making sure this operation is atomic and no other operations run while this is in progress.
Here is how you do it:
from datetime import datetime
starting_date = datetime.strptime("2021-03-01", "%Y-%m-%d").date()
Recipe.objects.filter(created_date__lt=starting_date).update(is_deleted=True)
As you can see, not only is this efficient but it also takes a lesser amount of code. This is the best way to achieve a bulk update when you want to update all the rows with the same value.
Using bulk_update
As you may have noticed, the above example does not work for cases where you want to update different rows with different values.
Thankfully there are ways to do this with a single operation using the bulk_update method that Django provides.
Let us take the example of having to update scores of each user with a different value. This can be done so as follows:
user_ids_dict = {
1: 100,
2: 150,
3: 500
# this dict can contain n key value pairs.
}
# create a list of user objects that need to be updated in bulk update
user_bulk_update_list = []
for key, value in user_ids_dict:
user = User.objects.get(id=key)
user.score = value
# append the updated user object to the list
user_bulk_update_list.append(user)
# update scores of all users in one operation
User.objects.bulk_update(user_bulk_update_list, ['score'])
There is a problem with this approach though. You are running a select query on the database for each user ID before running the bulk update operation.
This can be avoided using an atomic transaction instead.
Using an atomic transaction
There is another way to achieve the problem of updating multiple rows with different values. You can do this by running each update operation inside an atomic transaction block.
Doing this will ensure that all the update operations performed in the loop will be executed in the database as a single transaction as opposed to performing each of these operations separately.
Code looks something like this:
user_ids_dict = {
1: 100,
2: 150,
3: 500
# this dict can contain n key value pairs.
}
from django.db import transaction
with transaction.atomic():
for key, value in user_ids_dict:
User.objects.filter(id=key).update(score=value)
As you can see, with this method there are no select queries being run at all. We are simply running multiple update queries in a single transaction.
This method is the most efficient way to achieve bulk update when your requirement is to update different rows with different values.
Closing notes
To bulk update rows with the same values based on a particular condition, use ModelName.objects.update() instead of running a save operation on each row separately.
To bulk update rows where each row has to be updated with a different value, use an atomic transaction where you run update queries for each row within a transaction block.
Posted on April 4, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.