♻️🐘 Disposable local Postgres databases (without containers!)

monacoremo

Remo

Posted on January 17, 2020

♻️🐘 Disposable local Postgres databases (without containers!)

In this post, my aim is to show you how to get 'disposable' Postgres databases whenever you need them, e.g. for iterative development and testing. I will also give an example on how to use them to get automatic code reload on your database schema.

I'll take you through the individual steps of setting up custom local instances of Postgres, as they also helped to understand Postgres a lot better. If you'd like to take a shortcut and skip this learning opportunity, you can also directly use the awesome pg_tmp tool! 🙂

The approach to running custom local instances of Postgres that I'll demonstrate here is easy, fast and reliable - so much so, that you should be able to reduce the complexity of your development setup in several ways:

  • It will eliminate the need for using databases in containers in many cases. Containers are super convenient, but they come with some extra complexity that we should avoid when we can.
  • If you currently substitute your production database with something like SQLite for development and testing, this approach will allow you to work in an environment that is much closer to production and enables you to use the full power of Postgres.
  • It will completely eliminate the need for fiddling around with the database cluster that comes with your OS (e.g. with the postgresql package in Debian and derivatives).

Running a local one-off Postgres database

I invite you to follow along with the steps below in your shell! We will put all of them into a convenient script later, but going through them one by one will give us the best opportunity to experiment and learn.

Step 1 / 4: Create a temporary directory

Let's create a temporary directory in which our one-off database cluster will live:

tmpdir=$(mktemp -d)

mktemp -d creates a new directory and prints the new path to stdout. We capture that output with $(...) and assign it to the tmpdir variable.

Everything that our Postgres instance does will happen in this directory, it will not save any configuration, data or logs anywhere else. If we want the directory to be automatically deleted when we are done with that database, we can set up the following trap:

trap "rm -rf $tmpdir" exit

Traps are a special feature of our shell. In this case, we tell the shell to make sure that rm -rf $tmpdir will always be run when it receives the exit signal.

Step 2 / 4: Set up environment

export PGDATA="$tmpdir"

This is the only environment variable that we need for now, initdb, postgres and pg_ctl will all use it to find the directory of our one-off database cluster.

Step 3 / 4: Initialize the database cluster

With the Postgres binaries on our $PATH, we can set up our new database cluster:

initdb --no-locale

This will set up the basic directory and file structure that Postgres needs in our temporary directory (which initdb finds via $PGDATA).

If you don't have the Postgres binaries on your $PATH yet, your OS should have them in a package. On Debian, for example, you can get them with sudo apt install postgresql.

You might not need the --no-locale option, but it makes the initialization more reliable on some systems, in my experience.

Step 4 / 4: Run the database server

postgres -c listen_addresses="" -k "$PGDATA"

This tells Postgres to not listen on any TCP port. Instead, we ask it to listen on a Unix domain socket in the directory we specify with -k. For simplicity, we just reuse the $PGDATA directory. As we will be in a unique temporary directory on each run, we will never have any port or socket collisions (or any data left over from previous runs!).

That's it! Our shiny new database cluster is ready to use. 🎉🎉🎉

As a bonus, on most Linux distributions it will run in memory (and therefore extremely fast!), as the temporary directory we use will usually be on a tmpfs.

Connecting to our new database

To connect over the Unix domain socket, we need to use a URI that refers to our socket directory (for which we lazily reused $PGDATA). Let's set it as a environment variable for convenience:

export DB_URI="postgresql:///postgres?host=$PGDATA"

Let's test if we can connect with psql:

> psql "$DB_URI"
psql (12.1)
Type "help" for help.

postgres=# 

Success! That URI should also work with any other tool or library that takes a Postgres connection URI.

Loading our database schema

We could load our database schema now, but it's actually a bit cleaner if we do that before we start the server. That way, we can avoid that other services in our stack get a connection before the database is fully ready. So let's insert an intermediate step:

Step 3.5 / 4: Load our database schema

mkdir "$PGDATA/setupsocket"

pg_ctl start -o "-c listen_addresses=\"\" -k $PGDATA/setupsocket"

psql "postgresql:///postgres?host=$PGDATA/setupsocket" -f app.sql

pg_ctl stop

Using a separate socket will make sure that none of the other services that we might have will be able to connect before our database is ready. The pg_ctl utility is useful here, as it will wait for the database startup and shutdown to complete before returning. When starting our database server up with pg_ctl start, we pass it the same arguments as we do to postgres as a string with the -o option.

Putting everything together in one script

Here's the full example:

tmpdir=$(mktemp -d)

trap 'rm -rf "$tmpdir"' exit

export PGDATA="$tmpdir"
export DB_URI="postgresql:///postgres?host=$PGDATA"

initdb --no-locale

mkdir "$PGDATA/setupsocket"
pg_ctl start -o "-c listen_addresses=\"\" -k $PGDATA/setupsocket"
psql "postgresql:///postgres?host=$PGDATA/setupsocket" -f app.sql
pg_ctl stop

postgres -c listen_addresses="" -k "$PGDATA"

Starting up a new Postgres instance this way takes less than a second on my machine. That's of course not as fast as SQLite, but it's probably good enough for most use-cases! And we get to use the full power of Postgres, including its extensions and the possibility to have more than one write connection.

To run tests immediately after the database is ready, e.g. a run-tests.sh script that connects to the database, we can replace the last line with the following:

postgres -c listen_addresses="" -k "$PGDATA" &

cleanup() {
    rm -rf "$tmpdir"
    kill 0
}

trap cleanup exit

./run-tests.sh $DB_URI

With this amendment, we first start Postgres in the background with &. Then, we need to adjust our trap to also kill that background process on exit. We do this by defining a function that both deletes the temporary directory and kills all background processes that are children of our current process. We then reset our earlier exit trap with that function. Finally, we can run our tests and be sure that we have a fresh, one-off database at the ready.

Reloading your database on code changes

Based on the script we created (let's save it as run-db.sh), we can easily get a database that is reloaded for every change in our code. For example, you can use entr to restart the database and rerun the tests for any change in the current directory:

find . | entr -r ./run-db.sh

Taking a shortcut with pg_tmp

pg_tmp is a neat tool that makes the whole process of setting up temporary Postgres databases even easier:

db_uri=$(pg_tmp)
psql $db_uri -f app.sql
./run-tests.sh $db_uri

This feels like cheating - but it works! In my opinion this is also much simpler than fiddling around with containers. As in our more manual setup we implemented before, you will never experience port collisions or remnants from earlier test runs with pg_tmp. In the background, pg_tmp pretty much does what we implemented ourselves, while adding some more optimizations to make the startup-process even faster! The manual approach I guided you through above can, however, be useful if you need more customization and control (like in this example).

I hope you will enjoy and make good use of your many new free, recyclable Postgres databases! ♻️🐘

Many thanks to Eric Radman for writing the awesome entr and pg_tmp tools!

In an upcoming post, I would also like to show how to easily get all dependencies (in this case, the Postgres binaries, pg_tmp etc.) with Nix. It's not that scary :-)

💖 💪 🙅 🚩
monacoremo
Remo

Posted on January 17, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related