Why DB Migration Versioning when we have GIT?

phsaurav

PH Saurav

Posted on November 25, 2024

Why DB Migration Versioning when we have GIT?

Database migration versioning is often a notorious pain point in software development. Anyone who worked in a big enough team merging PRs knows it is quite gross to manage and a frequent source of conflict.

Naturally, this begs the question: if we already have robust version control systems like Git, why do we need an additional layer of version control for database migrations within a version control?

I once used to be one of those developers questioning this necessity and researched the topic deeply. Recently, after coming across a recent video by a popular developer and YouTuber Theo titled "We Don't Need Migrations Anymore". I thought of sharing my understanding and taking on this in a simple manner with an example.

Let’s dive into why DB Migration versioning may be gross but may be less gross than anything else we have.

Background Information

When working on a project alone, managing the database schema during development usually doesn’t require much thought. However, once the first release is made, every schema change must be applied not only to the local instance but also to other environments like development, staging, and production. At this point, it becomes essential to establish a system to orchestrate synchronization across the team and between environments.

Then, we have two common approaches we can take to orchestrate DB schema change:

1. Declarative Schema Management
In this approach, a central database schema represents the desired state. Any changes to the desired schema are compared with the current state, and the differences are used to generate the appropriate SQL queries to transition the database from its current state to the desired state. This is very similar to how Terraform works. This method has been gaining significant popularity recently, especially among infrastructure engineers.

2. Imperative Schema Management (Migrations)
In this approach, there is no central database schema. Instead, changes are applied iteratively and represented as a series of instruction sets. Each set of SQL instructions is referred to as a "Migration." Migrations track both the previous and the next revision or version, forming a sequential chain of instructions that are executed in a specific "ORDER."

Now, as the basics are out of the way, let's see why one should choose one or the other.

At first glance, the declarative approach seems much simpler. You only need to maintain a central schema, and the rest is handled automatically. In contrast, migrations can quickly become messy. For instance, imagine two team members working on separate branches, both originating from a branch where the latest migration is A. Each developer creates a new migration, B and C, respectively. When it’s time to merge the changes, conflicts naturally arise: which migration should come after A —> B or C?

So why not use the declarative approach in every scenario? Let's understand this from an example.

Example Scenario: E-commerce Platform Expansion

Suppose your company runs an e-commerce platform that serves thousands of customers daily. The backend is powered by a relational database for managing products, orders, users, and payments deployed in production and users are already using it.

Initial Database Schema
The products table in the database has the following structure:

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    price DECIMAL(10, 2) NOT NULL,
    stock INT NOT NULL
);
Enter fullscreen mode Exit fullscreen mode

Requirement Change
To support global expansion, the business has decided to introduce currency support for products and rename the name column to user_name. The changes are as follows:

  1. Add a currency column to the products table, ensuring it is not nullable. Currency value should be provided on product creation. Old products should be considered as 'USD'.
  2. Rename the name column to user_name, preserving all existing data.

Declarative Approach

As in the declarative approach, we maintain the desired schema, so the updated schema will be:

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    user_name VARCHAR(255) NOT NULL,
    price DECIMAL(10, 2) NOT NULL,
    stock INT NOT NULL
    currency VARCHAR(10) NOT NULL
);
Enter fullscreen mode Exit fullscreen mode

Great! Now, we can call it a day. Not so easy!😁

Issues:

  1. Existing Data Incompatibility:
        ERROR: column "stock" contains null values.
Enter fullscreen mode Exit fullscreen mode

Adding a NOT NULL column (currency in products) directly without default values breaks the database if rows already exist.

  1. Less control on specific instruction:

To change name to user_name and achieve the desired state, there are two possible approaches:

  • Drop the name column and create a new user_name column.
  • Alter the name column to user_name.

The first approach can be catastrophic for your application, and you don't have specific controls on which one will be implemented in the declarative approach.

  1. Running in wrong "ORDER":

One important thing about SQL instructions is that they need to run in order. Suppose that, in the generated instruction, the 2nd step runs before the first one. It will cause an error. The declarative approach doesn't give us control over the execution order.

Possible Solution:

These steps can be taken to meet the new requirement for our database.

Step 1:

-- Add the currency column
ALTER TABLE products
    ADD COLUMN currency VARCHAR(10) DEFAULT 'USD' NOT NULL;

-- Rename the 'name' column to 'user_name'
ALTER TABLE products
    RENAME COLUMN name TO user_name;
Enter fullscreen mode Exit fullscreen mode

Step 2 (Optional):

-- Ensure all existing products have default values
UPDATE products SET currency = 'USD';
Enter fullscreen mode Exit fullscreen mode

Step 3:

-- Remove default constraints to make columns explicitly set by API
ALTER TABLE products ALTER COLUMN currency DROP DEFAULT;
Enter fullscreen mode Exit fullscreen mode

Why Migrations?

Now, one can argue we can do these three steps in order one after another in a declarative approach and achieve the desired schema state without any issues. It's true. Then the approach will look something like this:

  1. Add the currency column with default and add the user_name column in the product table.
  2. Deploy the changes.
  3. Ensure the currency column existing data is filled and copy the name data to user_name.
  4. Run instructions in deployed DB.
  5. Drop the name column and drop the default for the currency column.
  6. Deploy the changes.

This may seem something very much doable when you are working with a local DB. But this will introduce a huge headache if you are doing it in a production environment where, most of the time, you don't have access to DB directly and orchestrating these changes to multiple environments can become a nightmare very quickly. Also, multiple people working on this kind of change can lead to possible catastrophes. But this can be a very good approach if the schema is controlled centrally by an infrastructure engineer rather than a developer. I think this is why the declarative approach is increasingly gaining popularity among infrastructure engineers.

Now, migrations solve these issues and make them safe by giving developers the ability to choose "Specific Approach" and "ORDER". Even one can run each migration with an SQL transaction, so if one fails, all the other instructions in the transaction will roll back, making it safer. It is easier to review the migrations and also easier to orchestrate specific implementation in different environments.

💖 💪 🙅 🚩
phsaurav
PH Saurav

Posted on November 25, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related