Alan Johnson
Posted on February 12, 2023
There are a ton of database migration tools out there, and they all have a similar set of features. A lot of times the decision on what migration tooling to use is decided by the ORM or database driver you're using.
For a while now I've wanted to dig into what a bare minimum migration tool that uses standard Unix tools and psql
, the PostgreSQL CLI, would look like. My goal here is to have something that should work on any general Unix system without having to lean on my ORM or other tools to use it.
Run scripts with psql
To get started, connect to the database with psql
. There are several ways to set connection details, but I like to use use a connection string:
set DATABASE_URL=postgres://postgres:postgres@127.0.0.1:5432/mydb
psql $DATABASE_URL
The rest of this post will take for granted that $DATABASE_URL
is set.
Once successfully connected, the psql
command presents a REPL that allows query input and other psql
specific commands. Hitting Ctrl-d
will exit psql
.
Beyond the REPL mode, psql
allows a -f [filename]
argument to be passed, which causes it to run the given file.
psql $DATABASE_URL -f script.sql
This is great for the goal of running migrations, because it means a migration in a .sql
file composed of a sequence of SQL queries can be run with psql
with the -f
argument.
There are a few other options on the psql
command that look handy. Giving psql
the -v ON_ERROR_STOP=1
option, which sets the ON_ERROR_STOP
variable to 1
does exactly what it looks like. If an error occurs in a migration, it will return an error status code.
Also -1
looks helpful. It will run an entire -f
script in a single transaction.
With all of those commands combined, here's the final psql
migration argument set:
psql -1 -v ON_ERROR_STOP=1 -f script.sql
It's a start!
Migration State Management
Most migration systems track state in a table within the database. The table tracks what migrations have and haven't been run, and usually some extra metadata like when they were run.
Here's a table creation query for a basic migration script:
CREATE TABLE IF NOT EXISTS "migrations"(
"id" integer PRIMARY KEY,
"migrated_at" timestamptz NOT NULL DEFAULT now()
);
This table has two columns. An id
column that tracks the version for a recorded migration, and migrated_at
that records the time the migration occurred.
To record to the migrations table, a query like this needs to be run:
INSERT INTO migrations (id) VALUES (1);
It'd be a little annoying to write that each time, though, so why not define
a function within PostgreSQL for inserting a migration version:
CREATE OR REPLACE FUNCTION log_migration(
id integer
) RETURNS void AS $$
BEGIN
INSERT INTO "migrations" (id) VALUES (id);
END;
$$ LANGUAGE plpgsql;
This takes a migration id and records it to the migrations
table. It can be run like this:
SELECT log_migration(1);
Because the migrations table exists and typically when a new migration is run it's expected to be working on an existing state of the database, it can also be helpful to have a function that will assert that the database state is as expected.
CREATE OR REPLACE FUNCTION assert_latest_migration (
id integer
)
RETURNS void AS $$
DECLARE
latest_id integer;
BEGIN
SELECT MAX(migrations.id) INTO latest_id FROM migrations;
ASSERT latest_id = id, 'migration assertion ' || id || ' failed, current latest is ' || latest_id;
RETURN;
END;
$$ LANGUAGE plpgsql;
This one's a little more complicated:
- It takes in an expected migration version as an argument.
- Queries for the latest migration version in the
migrations
table, and set it tolatest_id
. - Compare the expected migration version to
latest_id
. If they aren't equal, raise an error.
This function can be used like this:
SELECT assert_latest_migration(1);
If the latest migration in the migrations
table doesn't match the argument, which in the example above is 1
, an error will be raised. This fits nicely with the -v ON_ERROR_STOP=1
option in psql
.
Putting all of this together, the first migration file is ready to go:
migrations/000-bootstrap.sql
:
CREATE TABLE IF NOT EXISTS "migrations"(
"id" integer PRIMARY KEY,
"migrated_at" timestamptz NOT NULL DEFAULT now()
);
CREATE OR REPLACE FUNCTION log_migration(
id integer
) RETURNS void AS $$
BEGIN
INSERT INTO "migrations" (id) VALUES (id);
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION assert_latest_migration (
id integer
)
RETURNS void AS $$
DECLARE
latest_id integer;
BEGIN
SELECT MAX(migrations.id) INTO latest_id FROM migrations;
ASSERT latest_id = id, 'migration assertion ' || id || ' failed, current latest is ' || latest_id;
RETURN;
END;
$$ LANGUAGE plpgsql;
SELECT log_migration(0);
Running this with the psql
command above will apply it to the database:
psql -1 -v ON_ERROR_STOP=1 -f migrations/000-bootstrap.sql
Adding a Real Migration
Running a migration to set up migrations isn't that satisfying. Here's a first real migration, which creates a table of users:
migrations/001-create-users.sql
:
SELECT assert_latest_migration(0);
CREATE TABLE "users" (
id serial PRIMARY KEY,
username varchar(255) NOT NULL,
encrypted_password varchar(255) NOT NULL,
created_at timestamptz NOT NULL,
updated_at timestamptz NOT NULL
);
SELECT log_migration(1);
This uses the assert_latest_migration
function from above to ensure the database state starts out as expected. It then creates the users
table, and closes by using the log_migration
function to record the new migration version.
Run it with psql -1 -v ON_ERROR_STOP=1 -f migrations/001-create-users.sql
This minimal migration system seems to be working!
Bulk Executing Migrations
There's a great way to run a single migration, but what about all migrations? There are great Unix command line tools that can help piece together running all migrations.
find
will help get a list of all migrations in a directory:
find migrations -iname "*.sql"
This finds all files with a .sql
extension in the migrations directory. Find doesn't sort that list, though, so pipe the output from find into sort
.
find migrations -iname "*.sql" | sort
This outputs a list of all the filenames sorted. Since psql
needs a -f
in front of each filename, printf
along with xargs
can set that up:
find $1 -iname "*.sql" | sort | xargs printf -- '-f %s\n'
And then another xargs
runs the migrations:
find $1 -iname "*.sql" | sort | xargs printf -- '-f %s\n' | xargs psql $DATABASE_URL -1 -v ON_ERROR_STOP=1;
This will run all migrations and exit if any of them fail. They'll be run in a single transaction.
Cleaning Up
It's probably not practical to remember and type these psql
based commands out, so some simple bash
scripts can help give quick commands for running a single migration or running all migrations:
bin/migrate
:
#!/bin/bash
set -e
if [ -z ${DATABASE_URL+x} ];
then
echo "DATABASE_URL must be set";
exit 1;
fi;
if [ -z ${1+x} ] || [ ! -f $1 ];
then
echo "Usage: migrate [script.sql]";
exit 1;
fi;
psql $DATABASE_URL -1 -v ON_ERROR_STOP=1 -f $1;
Running this script will make sure the given argument is a file, and if not it'll give a usage message. Otherwise it'll run the migration script.
bin/migrate-all
:
#!/bin/bash
set -e
if [ -z ${DATABASE_URL+x} ];
then
echo "DATABASE_URL must be set";
exit 1;
fi;
if [ -z ${1+x} ] || [ ! -d $1 ];
then
echo "Usage: migrate-all [dir]";
exit 1;
fi;
find $1 -iname "*.sql" | sort | xargs printf -- '-f %s\n' | xargs psql $DATABASE_URL -1 -v ON_ERROR_STOP=1;
This script expects a directory as an argument, and runs all of the migrations in the given directory, in order.
There's a problem with this script, though, that will take a while to hit. There is a limit to how long an argument can be, and xargs
is simply concatenating all files into a huge list. There are limits to how long this argument list can be, though. Run getconf ARG_MAX
to see the maximum length of a command. On my computer right now it's 1048576
. It's long, but not infinite.
It's ideal to just use a loop in bash instead:
bin/migrate-all
(take 2):
#!/bin/bash
set -e
if [ -z ${DATABASE_URL+x} ];
then
echo "DATABASE_URL must be set";
exit 1;
fi;
if [ -z ${1+x} ] || [ ! -d $1 ];
then
echo "Usage: migrate-all [dir]";
exit 1;
fi;
files=$(find $1 -iname "*.sql"); #| xargs printf -- '-f %s\n' | xargs psql $DATABASE_URL -1 -v ON_ERROR_STOP=1;
for file in $files;
do
psql $DATABASE_URL -1 -v ON_ERROR_STOP=1 -f $file;
done;
And with that, the minimal migration solution is ready to use.
What's Missing
This isn't the most full featured migration system.
Run only missing migrations
This doesn't currently support running the latest migrations. In practice I don't generally miss that feature, because I prefer to drop my database and rebuild it in development. During deployment it's good to take a minute and think through the migrations that need to be run and review them, rather than just running a migrate command.
Roll Back
It's certainly missing any way to roll back migrations. My general experience is that I seldom to never migrate back, and that frequently migrations aren't able to be undone, so this doesn't bother me too much, but of course your team's mileage may vary there.
The broad strokes of how this system works has been in place at Cardinal for several years now, but we have some JavaScript based scripts rather than bash to tie it all together. It has worked great for us, so I'm confident it can help tons of other teams as well.
The Code
If you'd like to see the full example code from this, check out nalanj/migrations. Feel free to throw an issue or a discussion into that repo.
Posted on February 12, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 30, 2024
November 29, 2024
November 29, 2024
November 29, 2024