PostgreSQL backup notes

kgrz

Kashyap

Posted on March 16, 2020

PostgreSQL backup notes

(Originally posted at: https://kgrz.io/postgresql-backup-notes.html)

Some of my very rough notes when going nearly cover-to-cover of the
PostgreSQL backup and
restore

documentation section. It's one of the most detailed pieces of
documentation I've ever read, so this acts as a very high-level
summary. Most of the content is useful for general context surrounding
data backups of any kind.

I'm using /data directory to signify the data storage directory. This
depends on the chosen configuration, however.

Backups

Three main types of backup strategies:

  • SQL dump/load (stop-the-world)
  • Backup /data directory (stop-the-world)
  • Continuous Archiving

The first two typically need lots of extra space on the database server
to store the backup before you can upload it to some off-site storage.

"stop-the-world" in this context is not an official nomenclature. In
these strategies, most likely, the database needs to be shut down at
some point.

Continous archive-based backups are used for a leader-follower setup, or
even delta backups—where there's a base backup, and subsequent data as
deltas that can be used to restore the entire database.
Application-supported remote backups are quite simple, and so this is
the best strategy if the database servers are space-constrained.

SQL Dump/Load

This is the easiest strategy. pg_dump takes a backup, while
pg_restore command consumes the output of that backup. This strategy
is the simplest to cron-ify a backup, without external dependencies:
take a backup, upload the files to remote storage, test the backup on a
different machine, and do this every night.

  • pg_dump saves the database into a .sql statement. Requires large
    enough space to hold both the database and the backup script.

  • File sizes might be limited by kernel/OS, so that's something to look
    ahead while deciding to use this.

  • Restore from the pg_dump output might also need extra configuration
    tweaking around connection times: too less, and the database might
    close the connection before the entire script has run.

Copy /data directory

PostgreSQL's directory layout is straight-forward—once you get to know
it. Most of the data is put in one directory, and this includes the two
main components needed for any future restores: the data files, and the
temporary append-only log files. If the database is shut down, you're
free to copy over the data directory to another machine, and start off
from it. Configuration files typically aren't placed in the data
directory, so they might need to be copied as well.

Any strategies that have to rely on the file-system layout of
PostgreSQL, or features provided by the file system itself.

Two routes here: frozen snapshots of the file system, or using tools
like rsync, tar etc.

frozen snapshots

  • If the underlying file system supports atomic volume snapshots (btrfs,
    zfs, Apple's APFS for example), one can snapshot the entire data
    directory. Lots of caveats around how good the snapshot mechanism is
    implemented exist.

  • The backup can be taken without stopping the server. During restore,
    this strategy would require replaying the logs as there might be some
    commits that weren't turned to data files from the append only log.

rsync, tar, et al.

  • rsync, gzip, tar the data directory. These utilities don't
    take consistent snapshots of the disk, so it's best to shutdown the
    server. Shutting down the server forces a full flush of the data to
    disk.

  • An example two step process with rsync:

run rsync
shutdown the server
rsync --checksum

What's interesting is that this two-step process is similar to the one
used in online backups section. This is like a one-step delta backup
process if we stretch it enough: the first backup is a base backup that
contains the data committed till that point, then the second rsync
takes the delta and copies that over.

Continuous Archiving

This system can be used to setup a replicated system, consisting of a
leader and potentially multiple followers. The data from the leader is
pushed, and each of the followers might pull the data. Where this data
is stored is customisable. There are many ways to setup replication in
PostgreSQL, and the documentation for it is exhaustive. The archival
part deals with the first part: taking the backup and pushing it
somewhere.

This strategy piggy-backs on the fact that a WAL log may be used to
replay and restore a database. There are many caveats and configuration
tweaks to how long the WAL log files are retained, the size of those log
files and the naming of the files. It's best to ship the log files as
and when they are created to an external storage service. Rather than do
this manually via rsync et. al., PostgreSQL provides a way:
archive_command setting in the configuration, which takes a script.

  • WAL logs should be secured while transmission and remote storage,
    because these contain the actual data. (that goes for the main
    database too, fwiw)

  • archive_command should exit with 0 code. Otherwise, the command
    gets retried. The pg_wal directory may potentially get filled, and
    cause the server to crash!

  • archive_command should be designed to ensure it doesn't override
    existing files on the remote system.

  • Missing WAL logs from the archive might hamper future restore, so
    regular base backups will help keep the error surface area a little
    small.

  • old base backup + too many WAL logs to restore increase the restore
    time. It's important to determine the maths behind this to figure out
    how much downtime you might need and tweak the base backup frequency,
    and WAL file size accordingly.

General mechanism:

  • One base backup as a starting point
  • Continuous deltas in the form of the append-only log (WAL) files

The base backup marks the point where the backup would start
(checkpoint).

base backup

Two ways to take a base backup:

  • Use the pg_basebackup command from an external machine (or the same
    machine, with a different data directory setting), providing the
    connection info to connect to the leader.

    • Multiple commands can be run from multiple machines, but might depend on replication slots configuration on the leader.
    • Might use one/two connections depending on the variant of backup used: copy WAL logs at the end (1) or stream WAL logs parallelly (2).
    • Does not run if /data directory is not empty.
  • Two-step process via rsync. PostgreSQL provides two SQL statements

    for signalling the server that the user is taking a backup, and that

    a checkpoint has to be created: pg_start_backup, pg_stop_backup.

SELECT pg_start_backup('some_label')
rsync /data
SELECT * from pg_stop_backup();

Restore

stop-the-world restores

Data dumps taken with pg_dump or the file system strategy mentioned
above can be restored by pg_restore or just starting the server.
Needless to say, this strategy causes either data loss or needs
downtime, depending on the operations chosen.

  • If a system has a simple pg_dump cron job that ships the archive to
    remote storage, when the leader crashes or dies, the time to
    detection, copying the archive to the follower, pg_restore
    completion times is the amount of downtime that's required.

  • The cron job, if configured at a certain time in the day, differs from
    the time the crash happens, the delta in the data until that time on
    the leader is a potential loss in data.

  • When the leader crashes/dies, but you still have access to the
    physical data disks, recovery using file system snapshot is possible,
    and that may potentially recover all the data up till the point of the
    last commit. Because this recovery would also have the WAL files
    handy, the replay will make sure as much data as possible is
    recovered.

Continuous Archive restores

If the system is setup with continuous archiving, it may be possible to
recover all the data. Restore times depend on how fast the base backup
archive, WAL logs can be copied over to the new server, and the WAL log
replay.

Replication

There are many ways to do this, too, depending on the underlying infra:
shared disk (two machines accessing the same disk), file system
replication (a write to one drive is mirrored to a different machine
atomically), side-car middlewares that execute a given statement on
multiple machines simultaneously, or even application-level middlewares
that do this. Streaming/Point-in-time replication is one preferred
approach that can piggy back on the continuous archive backup strategy.

Streaming/Point-in-time replication strategy uses wal logs shipped to a
remote server using archive_command from the leader to be used to
replay the logs on a follower continously.

  • Note that in streaming replication is possible without using archive_command, provided the data ingestion throughput never exceeds the rate of the follower streaming the logs directly from the leader, and applying them locally (also depends on the network latency).

If the follower is not able to keep up with the logs, the logs on
leader might get recycled, and the follower will keep waiting for the
now-non-existent WAL file. Force-starting the follower in case of
failure will result in data loss.

💖 💪 🙅 🚩
kgrz
Kashyap

Posted on March 16, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related