How to detect field changes in Django

mmzeynalli

Miradil

Posted on January 10, 2024

How to detect field changes in Django

The Problem

While working on Django project, we have every now and then needed to know if a specific field of model has changed or not, and act accordingly. Let's say, you are developing a logistics website, and want to store status changes of packages whenever there is one. So, you would have model structure similar to something like this:

from django.contrib.auth import get_user_model
from django.db import models

UserModel = get_user_model()

class Status(models.Model):
    name = models.CharField(max_length=32, unique=True)


class Package(models.Model):
    user = models.ForeignKey(UserModel, on_delete=models.CASCADE)
    shipment_cost = models.DecimalField(max_digits=6, decimal_places=2)
    weight = models.DecimalField(max_digits=5, decimal_places=2)
    status = models.ForeignKey(Status, on_delete=models.CASCADE)


class PackageStatusHistory(models.Model):
    package = models.ForeignKey(Package, on_delete=models.CASCADE)
    from_status = models.ForeignKey(Status, on_delete=models.CASCADE, related_name='from_status', null=True)
    to_status = models.ForeignKey(Status, on_delete=models.CASCADE, related_name='to_status')
    created_at = models.DateTimeField(auto_now_add=True)
Enter fullscreen mode Exit fullscreen mode

Then, one would add post_save signals, and register the status change:

@receiver(post_save, sender=Package)
def register_status_change(sender, instance: Package, **kwargs):
    old_status_id = ...
    if instance.status_id != old_status_id:
        # There is a status change
        PackageStatusHistory.objects.create(package_id=instance.id,
                                            from_status_id=old_status_id,
                                            to_status_id=instance.status_id)
Enter fullscreen mode Exit fullscreen mode

The problem is, we do not know the old status id! If somewhere in the code we do package.status = new_status, the status field of that class/model has changed, and old value is lost. There are some ways to tackle this problem, and we will analyze some of them.

Note:
We manually added 5 statuses: Created, InPreparation, Shipped, Received, Delivered.
We also added one package with id=1 for testing.

Solution 1: The easy way?

Any person who has used post_save signal a lot knows that signal sends the updated field as an argument. That is, **kwargs contains update_fields which is a frozenset. That would help us right?

Not really. There are several drawbacks of this method:

  1. If you change the value of field to its current value (that is, the value stays the same, no change), if would be reflected in update_fields, which is not suitable for us.
  2. This frozenset is not automatically generated by Django. It contains only the fields that you explicitly passed in save() method. This makes it hard for us to develop and maintain the code. I am not going to test this; however, you are free to do so.

Solution 2: Query the old value

We know that Django models contain the old values that we manually changed, until refresh_from_db() is called. We can use this to our advantage: what if we get the old status from database, and check it with "dirty" Django model?

@receiver(post_save, sender=Package)
def register_status_change(sender, instance: Package, **kwargs):
    old_status_id = Package.objects.get(id=instance.id).status_id
    if instance.status_id != old_status_id:
        # There is a status change
        PackageStatusHistory.objects.create(package_id=instance.id,
                                            from_status_id=old_status_id,
                                            to_status_id=instance.status_id)
Enter fullscreen mode Exit fullscreen mode

If we check:

>>> from cache_fields.models import *
>>> package = Package.objects.get()
>>> package
<Package: 1: Created>
>>> package.status_id
1
>>> package.status_id = 2
>>> package.status
<Status: InPreparation>
>>>
>>> PackageStatusHistory.objects.filter(package_id=package.id)
<QuerySet []>
>>> package.save()
>>> package.status
<Status: InPreparation>
>>> PackageStatusHistory.objects.filter(package_id=package.id)
<QuerySet []>
>>>
Enter fullscreen mode Exit fullscreen mode

What? Why our status history is empty? Here is why: we are using post_save, so the database call for old value happens AFTER we write a new value to the database. We could have switched to pre-save signal (so that we read old value before we update database), however, we need to add extra handler: when the package is created for the first time, it has no id to pass to PackageStatusHistory:

@receiver(pre_save, sender=Package)
def register_status_change(sender, instance: Package, **kwargs):
    old_status_id = instance.id and Package.objects.get(id=instance.id).status_id

    if instance.status_id != old_status_id:
        # There is a status change
        PackageStatusHistory.objects.create(package_id=instance.id,
                                            from_status_id=old_status_id,
                                            to_status_id=instance.status_id)
Enter fullscreen mode Exit fullscreen mode

Now, if we try this:

>>> from cache_fields.models import *
>>> package = Package.objects.get()
>>> package
<Package: 1: Created>
>>> package.status_id = 2
>>> package.status
<Status: InPreparation>
>>> PackageStatusHistory.objects.filter(package_id=package.id)
<QuerySet []>
>>> package.save()
>>> package.status
<Status: InPreparation>
>>> PackageStatusHistory.objects.filter(package_id=package.id)
<QuerySet [<PackageStatusHistory: 1: 1 -> 2>]>
>>>
Enter fullscreen mode Exit fullscreen mode

Notice how we used old_status_id = instance.id and Package.objects.get(id=instance.id).status_id
It is a nice shortcut for if statement: and statement executes second part (right-hand part) ONLY IF the first part is "True".
In our case, if id is None, second part is not executed, so value is old_status_id = None
If it is valid value, second part is executed: old_status_id = Package.objects.get(id=instance.id).status_id

So, it works? Yes, however, it is not the best implementation. Every time you have change (even if not in status field) you will execute an extra database call, which, for large systems will slow you down.

Solution 3: Overwrite init

This method allows us to save extra database calls. Considering that Django models are just Python classes containing database object data, we can use our own fields to save old data:

class Package(models.Model):
    user = models.ForeignKey(UserModel, on_delete=models.CASCADE)
    shipment_cost = models.DecimalField(max_digits=6, decimal_places=2)
    weight = models.DecimalField(max_digits=5, decimal_places=2)
    status = models.ForeignKey(Status, on_delete=models.CASCADE)

    def __init__(self, *args: Any, **kwargs: Any) -> None:
        super().__init__(*args, **kwargs)
        self.cached_status_id = self.status_id
Enter fullscreen mode Exit fullscreen mode

When the object is fetched from database first time, we will have a "duplicate" status field. However, if anywhere in the code, we change the status field, we will have historical data in cached_status_id field. Pretty cool, huh? Notice how we us status_id instead of status, because it is easier to handle integers, rather than objects. Moreover, if not handled correctly (let's say, select_related is not used while fetching) then, you will have extra database calls to Status model.

We can also now use post_save signal, as we handle data right at the beginning. Now if we implement the signal:

@receiver(post_save, sender=Package)
def register_status_change(sender, instance: Package, **kwargs):
    if instance.status_id != instance.cached_status_id:
        # There is a status change
        PackageStatusHistory.objects.create(package_id=instance.id,
                                            from_status_id=instance.cached_status_id,
                                            to_status_id=instance.status_id)
Enter fullscreen mode Exit fullscreen mode

Which results in:

>>> from cache_fields.models import *
>>> package = Package.objects.get()
>>> package
<Package: 1: InPreparation>
>>> package.status_id
2
>>> package.cached_status_id
2
>>> package.status_id = 3
>>> PackageStatusHistory.objects.filter(package_id=package.id)
<QuerySet [<PackageStatusHistory: 1: 1 -> 2>]>
>>> package.save()
>>> package.status_id
3
>>> PackageStatusHistory.objects.filter(package_id=package.id)
<QuerySet [<PackageStatusHistory: 1: 1 -> 2>, <PackageStatusHistory: 1: 2 -> 3>]>
>>>
Enter fullscreen mode Exit fullscreen mode

We prevented extra database calls, yay! We can extend this method saving the dict of the model as variable (by using model_to_dict function) and then we would have access to old value of every field! You can also extend this logic to mixin, as explained in this StackOverflow answer. However, I personally prefer this method, having a class variable for each cached field, which is a little hard to maintain, however, efficient.

Solution 4: Third-Party Libraries

There is also similar implementation with mixin as third-party library: Django Dirty Fields. There is not much explaining to do, so let's just test it:

The code:

@receiver(post_save, sender=Package)
def register_status_change(sender, instance: Package, **kwargs):
    old_status_id = instance.get_dirty_fields(check_relationship=True).get("status", None)

    if instance.status_id != old_status_id:
        # There is a status change
        PackageStatusHistory.objects.create(package_id=instance.id,
                                            from_status_id=old_status_id,
                                            to_status_id=instance.status_id)
Enter fullscreen mode Exit fullscreen mode

The result:

>>> from cache_fields.models import *
>>> package = Package.objects.get()
>>> package
<Package: 1: Shipped>
>>> package.status_id
3
>>> package.cached_status_id
3
>>> package.is_dirty(check_relationship=True)
False
>>> PackageStatusHistory.objects.filter(package_id=package.id)
<QuerySet [<PackageStatusHistory: 1: 1 -> 2>, <PackageStatusHistory: 1: 2 -> 3>]>
>>> package.status_id = 4
>>> package.is_dirty(check_relationship=True)
True
>>> package.get_dirty_fields(check_relationship=True)
{'status': 3}
>>> package.save()
>>> package.status_id
4
>>> PackageStatusHistory.objects.filter(package_id=package.id)
<QuerySet [<PackageStatusHistory: 1: 1 -> 2>, <PackageStatusHistory: 1: 2 -> 3>, <PackageStatusHistory: 1: 3 -> 4>]>
>>>
Enter fullscreen mode Exit fullscreen mode

It works, nice! You can get access to the whole code from this repository. Feel free to add your ideas/suggestions!

💖 💪 🙅 🚩
mmzeynalli
Miradil

Posted on January 10, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related