Versioning a Relational Data Model

Suppose we have an application dealing in Birds. We have a Bird class, which could be in any language, but which we'll consider in Python for examples sake.

class Bird:
    def __init__(
        self,
        unique_id: int,
        species: Species, # Assume defined elsewhere
        name: str,
    ) -> None:
        self.unique_id = unique_id
        self.species = species
        self.name = name
        return

Inside our database, we might have created a birds table. Example SQL in this article will use Postgres syntax, but is mostly generic SQL.

create table birds (
    unique_id serial primary key,
    species integer not null references species(unique_id),
    name text not null
);

Suppose our application is terribly useful and popular, and many customers are very happy being able to uniquely identify and name their birds. However, some greedy, ungrateful customers would like to be able to rename their birds.

Inside the application, we might add a new method to our Bird class:

# Continues on from `Bird` defined earlier

    def rename(
        self,
        new_name: str,
        database: Database # Some database adapter type
    ) -> Self: 

        # Execute a database query changing the bird's name
        # and return the updated bird

Inside the database, we make the following changes:

create table names (
    bird integer not null references birds(unique_id),
    name text not null,
    created timestamp with time zone not null,
    primary key(bird, created)
);

insert into names (bird, name, created)
select
    unique_id,
    name,
    now() at time zone 'utc'
from   birds;

alter table birds drop column name;

Fantastic! We now have 5ish Normal Form renaming of birds. Everything is wonderful. Except... Some things are now pear shaped.

Our database is now incompatible with prior versions of our application code
We must ensure that any new database instance created for the bird application uses the newer data model

In a very small system like this one, these problems are not really problems. It's easy to keep track of what database instance is using what data model, and make manual adjustments where necessary.

What about when we have thousands of tables spread across dozens of schemas? These problems become very big problems.

It's easy to keep application code versioned using Git. Could we do something similar for a relational data model? Yes indeed. The first step is to make the database "self-aware". We can do this by creating a new table:

create table meta_revisions (
    executed timestamp with time zone not null,
    version integer not null unique
);

Whenever we modify the data model, we write to this table. Every revision gets an associated version number. For example, the revision that created the initial birds table might be 1 and the revision adding the names table might be 2.

You could put all your revisions in a folder, and then name each revision file with its version.

/project/data_model_revisions/1.sql
/project/data_model_revisions/2.sql

Git will naturally handle the process of versioning these files. So long as you never go back and edit a revision file, the folder becomes a sequence of instructions that can build up your data model from scratch. Almost like an SQL dump file, except one that takes the meandering course of your application's development.

I like to include a little tool in my codebases that handles data model revisions. In pseudo code, it goes something like this:

last_executed_revision = database.retrieve_last_revision();
sequence_number = integer(last_executed_revision)

for file in revisions_folder:
    file_sequence = integer(file.name_without_extension)
    if file_sequence > last_executed_revision:
        for statement in file.body.split(';'):
            database.execute(statement)
        database.record_revision(file_sequence)

Of course, in the real world, you can make this tool much more complicated and useful. It can automatically handle rollbacks on errors, do tricky query formatting, or some other task that eases management of your data model.

However you do it, the end result is data model bliss. You can always be sure that your database schemas are in sync with your application code, and you can rebuild schemas to the specification of any prior version of your application code.

Blog

Versioning a Relational Data Model

Hugh Jeremy

Join Our Newsletter. No Spam, Only the good stuff.

Related