Why DB Migration Versioning when we have GIT?
PH Saurav
Posted on November 25, 2024
Database migration versioning is often a notorious pain point in software development. Anyone who worked in a big enough team merging PRs knows it is quite gross to manage and a frequent source of conflict.
Naturally, this begs the question: if we already have robust version control systems like Git, why do we need an additional layer of version control for database migrations within a version control?
I once used to be one of those developers questioning this necessity and researched the topic deeply. Recently, after coming across a recent video by a popular developer and YouTuber Theo titled "We Don't Need Migrations Anymore". I thought of sharing my understanding and taking on this in a simple manner with an example.
Let’s dive into why DB Migration versioning may be gross but may be less gross than anything else we have.
Background Information
When working on a project alone, managing the database schema during development usually doesn’t require much thought. However, once the first release is made, every schema change must be applied not only to the local instance but also to other environments like development, staging, and production. At this point, it becomes essential to establish a system to orchestrate synchronization across the team and between environments.
Then, we have two common approaches we can take to orchestrate DB schema change:
1. Declarative Schema Management
In this approach, a central database schema represents the desired state. Any changes to the desired schema are compared with the current state, and the differences are used to generate the appropriate SQL queries to transition the database from its current state to the desired state. This is very similar to how Terraform works. This method has been gaining significant popularity recently, especially among infrastructure engineers.
2. Imperative Schema Management (Migrations)
In this approach, there is no central database schema. Instead, changes are applied iteratively and represented as a series of instruction sets. Each set of SQL instructions is referred to as a "Migration." Migrations track both the previous and the next revision or version, forming a sequential chain of instructions that are executed in a specific "ORDER."
Now, as the basics are out of the way, let's see why one should choose one or the other.
At first glance, the declarative approach seems much simpler. You only need to maintain a central schema, and the rest is handled automatically. In contrast, migrations can quickly become messy. For instance, imagine two team members working on separate branches, both originating from a branch where the latest migration is A. Each developer creates a new migration, B and C, respectively. When it’s time to merge the changes, conflicts naturally arise: which migration should come after A —> B or C?
So why not use the declarative approach in every scenario? Let's understand this from an example.
Example Scenario: E-commerce Platform Expansion
Suppose your company runs an e-commerce platform that serves thousands of customers daily. The backend is powered by a relational database for managing products, orders, users, and payments deployed in production and users are already using it.
Initial Database Schema
The products table in the database has the following structure:
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
price DECIMAL(10, 2) NOT NULL,
stock INT NOT NULL
);
Requirement Change
To support global expansion, the business has decided to introduce currency support for products and rename the name
column to user_name
. The changes are as follows:
- Add a
currency
column to theproducts
table, ensuring it is not nullable. Currency value should be provided on product creation. Old products should be considered as 'USD'. - Rename the
name
column touser_name
, preserving all existing data.
Declarative Approach
As in the declarative approach, we maintain the desired schema, so the updated schema will be:
CREATE TABLE products (
id SERIAL PRIMARY KEY,
user_name VARCHAR(255) NOT NULL,
price DECIMAL(10, 2) NOT NULL,
stock INT NOT NULL
currency VARCHAR(10) NOT NULL
);
Great! Now, we can call it a day. Not so easy!😁
Issues:
- Existing Data Incompatibility:
ERROR: column "stock" contains null values.
Adding a NOT NULL
column (currency
in products
) directly without default values breaks the database if rows already exist.
- Less control on specific instruction:
To change name
to user_name
and achieve the desired state, there are two possible approaches:
- Drop the
name
column and create a newuser_name
column. - Alter the
name
column touser_name
.
The first approach can be catastrophic for your application, and you don't have specific controls on which one will be implemented in the declarative approach.
- Running in wrong "ORDER":
One important thing about SQL instructions is that they need to run in order. Suppose that, in the generated instruction, the 2nd step runs before the first one. It will cause an error. The declarative approach doesn't give us control over the execution order.
Possible Solution:
These steps can be taken to meet the new requirement for our database.
Step 1:
-- Add the currency column
ALTER TABLE products
ADD COLUMN currency VARCHAR(10) DEFAULT 'USD' NOT NULL;
-- Rename the 'name' column to 'user_name'
ALTER TABLE products
RENAME COLUMN name TO user_name;
Step 2 (Optional):
-- Ensure all existing products have default values
UPDATE products SET currency = 'USD';
Step 3:
-- Remove default constraints to make columns explicitly set by API
ALTER TABLE products ALTER COLUMN currency DROP DEFAULT;
Why Migrations?
Now, one can argue we can do these three steps in order one after another in a declarative approach and achieve the desired schema state without any issues. It's true. Then the approach will look something like this:
- Add the
currency
column with default and add theuser_name
column in the product table. - Deploy the changes.
- Ensure the
currency
column existing data is filled and copy thename
data touser_name
. - Run instructions in deployed DB.
- Drop the
name
column and drop the default for thecurrency
column. - Deploy the changes.
This may seem something very much doable when you are working with a local DB. But this will introduce a huge headache if you are doing it in a production environment where, most of the time, you don't have access to DB directly and orchestrating these changes to multiple environments can become a nightmare very quickly. Also, multiple people working on this kind of change can lead to possible catastrophes. But this can be a very good approach if the schema is controlled centrally by an infrastructure engineer rather than a developer. I think this is why the declarative approach is increasingly gaining popularity among infrastructure engineers.
Now, migrations solve these issues and make them safe by giving developers the ability to choose "Specific Approach" and "ORDER". Even one can run each migration with an SQL transaction, so if one fails, all the other instructions in the transaction will roll back, making it safer. It is easier to review the migrations and also easier to orchestrate specific implementation in different environments.
Posted on November 25, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.