Writing a Django Data Migration with Real World Example

guin

Angelika Jarosz

Posted on May 24, 2019

Writing a Django Data Migration with Real World Example

Most of the time when we think of migrations in Django, we are referring to schema migrations. Django can create these for you automatically as they describe a change to the structure of your database, but not a change in the data itself. However, another type of migration you may find yourself using is a data migration. Data migrations are useful when you are loading new data, or want to change the data in your database using a specific schema.

I came across this problem when I was building ickly, a search interface into NYC Restaurant Health Inspection Data. I wanted users of my app to be able to search for a restaurant by name and see all of its inspections data. The dataset was a CSV file whose rows corresponded to inspections, however, it did have a 'camis' field which was a unique identifier for a business. I wanted to transform this data to match the data models I wanted for Businesses and Inspections and I needed to get all of the unique businesses.

If you are just loading a fixture or some sample data that is already in the structure you need it to be in, you may not need a data migration, but can use the loaddata command provided by Django.

Creating a data migration

Django can't automatically generate data migrations for you, but you can write one yourself. You can run the following command to generate an empty migration file in which you will add operations.

python manage.py makemigrations --empty yourappname

The main operation that we will look at and you will use for your data migration is RunPython. Here is what the auto generated file will look like:

# Generated by Django A.B on YYYY-MM-DD HH:MM
from django.db import migrations

class Migration(migrations.Migration):

    dependencies = [
        ('yourappname', '0001_initial'),
    ]

    operations = [
    ]

RunPython expects a callable as its argument. This function which you will write takes two arguments, an app registry and a schema editor. We then add the RunPython operation passing in our function. This will cause it to be executed when we run ./manage.py migrate from the command line.

from django.db import migrations

def my_function(apps, schema_editor):
    # logic will go here
    pass


class Migration(migrations.Migration):

    dependencies = [
        ('yourappname', '0001_initial'),
    ]

    operations = [
        migrations.RunPython(my_function),
    ]

The app registry maintains a list of the historical versions of all your available models. We want to use the app registry in our function to get the historical version by using apps.get_model('your_app_name', 'your_model_name) instead of just importing the model directly. We do this because we want to make sure we are using the version of the model that this migration expects. If you use a direct import you may be importing a newer version.

The SchemaEditor can be used to manually effect database schema changes. With the exception of highly advanced cases, you most likely will not want to interact with this directly. The SchemaEditor exposes operations as methods and turns things like "create a model" or "alter a field" into SQL.

The RunPython operation can also take a second callable. This second function would contain the logic you want to happen when migrating backwards. If you do not provide one, attempting to migrate backwards will raise an exception. If you want to learn more about the RunPython operation and other optional arguments check out the documentation here

Example

Lets look at an example of a migration taken directly from my code for ickly. I've added comments to point out all the relevant parts we went over in this post.

# -*- coding: utf-8 -*-
# Generated by Django 1.10.1 on 2017-04-20 21:02
from __future__ import unicode_literals

from django.db import migrations, models
import csv
from datetime import datetime


def load_initial_data(apps, schema_editor):
    # get the correct versions of models using the app registry
    Business = apps.get_model("api", "Business")
    Inspection = apps.get_model("api", "Inspection")

    # This is where your migration logic will go. 
    # For my use case i needed to get unique businesses and 
    # transform data from the csv file into the schema i wanted 
    with open('DOHMH_NYC_Restaurant_Inspection_Results.csv') as csv_file:
        reader = csv.reader(csv_file)
        header = next(reader)

        businesses = []
        inspections = []

        for row in reader:
            camis = row[0]
            business = next((b for b in businesses if b.camis == camis), None)
            if not business:
                business = Business(camis=row[0], name=row[1],
                            address="{} {} {} {}".format(row[3], row[4], row[2], row[5]),
                            phone=row[6], cuisine_description=row[7])
                businesses.append(business)

            inspection = Inspection(business=business,
                                record_date=datetime.strptime(row[16],"%m/%d/%Y").date(),
                                inspection_date=datetime.strptime(row[8],"%m/%d/%Y").date(),
                                inspection_type=row[17], action=row[9], violation_code=row[10],
                                violation_description=row[11], critical_flag=row[12],
                                score=int(row[13]) if row[13] else None,
                                grade=row[14],
                                grade_date = datetime.strptime(row[15],"%m/%d/%Y").date() if row[15] else None)
            inspections.append(inspection)

        Business.objects.bulk_create(businesses)
        Inspection.objects.bulk_create(inspections)

## logic for migrating backwards
def reverse_func(apps, schema_editor):
    Business = apps.get_model("api", "Business")
    Inspection = apps.get_model("api", "Inspection")

    Business.objects.all().delete()
    Inspection.objects.all().delete()

class Migration(migrations.Migration):
    # Django automatically adds dependencies for your migration
    # when you generate the empty migration
    dependencies = [
        ('api', '0002_auto_20170420_2101'),
    ]

    # the RunPython operation with the two callables passed in
    operations = [
        migrations.RunPython(load_initial_data, reverse_func)
    ]

There is a lot more to know about Django data migrations, but you now have the knowledge to know whether or not you may need to write one and to get you started if you do. If you want to learn more about Django migrations in general the documentation provides a great overview.

If you have any questions, comments, or feedback - please let me know. Follow for new weekly posts about JavaScript, React, Python, and Django!

Cover Photo by Taylor Vick on Unsplash

💖 💪 🙅 🚩
guin
Angelika Jarosz

Posted on May 24, 2019

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related