22% of Django websites can't roll back prod thanks to these 2 mistakes
Code Review Doctor
Posted on January 5, 2021
Django Doctor audits code and auto fixes Django anti-patterns. We checked 666 Django projects for migrations that cannot be rolled back and were shocked that 22% of the Django codebases had the following problems:
- Data migration is missing reverse handler: migration has handler for updating the data, but not for undoing that change. Read more
- Data migration importing "hot" model: migration imports a Model from the codebase instead of doing
apps.get_model(...)
. Read more
These prevent disaster recovery. Picture the scene: it鈥檚 11pm and production is down. Tired and stressed and out of options you decide to roll back to the last good release. All you need is to reverse the database migrations and then deploy and...wait what?
$ python manage.py migrate hounds 0002
Unapplying hounds.0003_auto...
Traceback (most recent call last):
django.db.migrations.exceptions.IrreversibleError:
Operation <RunPython > in hounds.0003_auto is not reversible
This blocks rolling back to the last good release because the database state would be out of phase with the last good code: if your tried to rollback to the old code then the database state won't be compatible with the old codebase.
1. Missing reverse migration
Django data migrations have two aspects: forwards and backwards, but by default the skeleton data migration generated by Django only shows the forwards handler, and hence one in five Django codebases have at least one migration that does not support undoing the migration:
def forwards(apps, schema_editor):
...
class Migration(migrations.Migration):
dependencies = [("cases", "0009_auto_20200320_1210")]
operations = [migrations.RunPython(forwards)]
Unfortunately it only takes one missing reverse
to make all migrations fail when attempting to undo the migrations. Human error during code review can account for the fact so many Django codebases are unable to recover from disaster in production: if 100% of mistakes were spotted 100% of the time by 100% of code reviewers then we would not need code review in the first place because such 脺bermensch would not make the mistake in the first place.
Django data migrations require forwards, but backwards is technically optional. While the above is valid, below is infinitely better:
def forwards(apps, schema_editor):
...
def backwards(apps, schema_editor):
...
class Migration(migrations.Migration):
dependencies = [("cases", "0009_auto_20200320_1210")]
operations = [migrations.RunPython(forwards, backwards)]
So backwards is technically optional in the same way that wearing a parachute while skydiving is optional: given enough time you will hit the ground. Indeed, if you want the option of rolling back your production website then no, backwards is not optional. Omitting backwards means you only have the option of rolling forwards, which is a bit more daunting compared with safely rolling back to a release we know works, especially when performing disaster recovery under a stressed state. Indeed, if mistakes can be made during a calm code review then imagine the scope for mistakes when under pressure.
The fix is a super simple too: just specifying migrations.RunPython.noop
can be enough. As the name implies, Django will do nothing for the reverse, simply skip over the data migration, without throwing IrreversibleError
:
class Migration(migrations.Migration):
dependencies = [("cases", "0009_auto_20200320_1210")]
operations = [migrations.RunPython(forwards, migrations.RunPython.noop)]
So in practice follow this advice:
馃 You can enable Django Doctor on your GitHub repos to protect against this problem 100% of the time.
Importing hot model
Can you see the problems with this Django migration?
from django.db import migrations
from territory import models
def forwards(apps, schema_editor):
for item in in models.ChickenCoopLocations.objects.all():
item.has_chickens = does_have_chickens(item.pk)
item.save()
class Migration(migrations.Migration):
dependencies = [("cases", "0005_auto_does_have_chickens.py")]
operations = [migrations.RunPython(forwards)]
def does_have_chickens(pk):
...
It's directly importing from models.py. Given enough time and enough code changes this will eventually break the migrations as the code diverges from the historic state of the database when the data migration was generated.
Out of step
The fields in Django's models.py
must agree with the schema in the database. When Django performs database read or write operations it uses the shape of the model in models.py
to determine what fields to SELECT
and INSERT
. If models.py
includes fields that are not yet in the database schema then the database will throw an error.
This is easily missed if the code is reviewed while in a rush, because when 0006_populate_has_chickens
is ran this bug will not happen. Indeed, at that point in time models.py
does agree with the schema, but what happens when in one weeks time your team member adds a new field to ChickenCoopLocations
? From that point on whenever 0006_populate_has_chickens
is ran, models.py
will have a have a field that the database schema does not yet have, so migrations will fail. This will happen when setting up a new database when all the migrations run from scratch such as in the CI, when a new developer joins the team, or when you replaced your bricked laptop.
Future proof
In 0006_populate_has_chickens.py
it's better to use apps.get_model
, which asks Django to construct a simplified time-traveling model whose fields will reflect the fields in the database even if models.py
is vastly out of step with the schema:
# 0006_populate_has_chickens.py
from django.db import migrations
def forwards(apps, schema_editor):
ChickenCoopLocations = apps.get_model("territory", "ChickenCoopLocations")
for item in in ChickenCoopLocations.objects.all():
item.has_chickens = does_have_chickens(item.pk)
item.save()
class Migration(migrations.Migration):
dependencies = [("cases", "0005_auto_foo.py")]
operations = [migrations.RunPython(forwards)]
def does_have_chickens(pk):
...
So directly importing models in migrations is flaky and in a few migrations time will probably fail because during migrations the code in models.py is out of step with the database schema: the models.py
can have a field defined that does not yet exist in the database because the required migration has not yet ran.
Does your Django code prevent disaster recovery?
Over time it's easy for tech debt to slip into your codebase. I can check that for you at django.doctor, or can review your GitHub PRs:
Or try out Django refactoring challenges.
Posted on January 5, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.