Vehbi Sinan Tunalioglu
Posted on July 30, 2024
This is a technical note on how to archive PostgreSQL backups on NixOS to one or more targets using rclone.
Problem
Some of my objectives during provisioning new hosts and services are to ensure that:
- The system is reproducible,
- The process and related artifacts are documented, and
- The data is backed up and archived securely.
Using NixOS solves the first two objectives due to its declarative nature. However, the third one is not directly addressed by NixOS alone. It is depends on how services are configured and data is managed.
If I am lucky, the host is stateless, i.e., all data is provisioned from an external source such as a separate database server. This means the entire host is disposable: when desired, I can launch a new host on my favourite cloud provider, redeploy services, and make necessary DNS record changes.
However, this is not always the case. Sometimes, the host is running a database server, typically a PostgreSQL service along with other services. In such cases, I need to ensure that the data is backed up regularly and archived safely and securely to one or more targets.
In relation to PostgreSQL, I want to ensure that:
- Database snapshots (backups) are taken regularly,
- Backups are encrypted, and
- Encrypted backups are archived to a single target or preferably to multiple targets.
Context
There are two ways of approaching this problem:
- Use an external service that performs the backup and archiving for this host (acting as a supervisor), or
- Use a service running on the host itself to perform the backup and archiving.
I prefer the second one because it is more cost-effective, gives me more control over the process, and allows me to stick to the NixOS way of doing things to maintain ubiquity.
In terms of archive targets, I prefer to use object stores, such as S3-compliant services. This is because they are cost-effective, reliable, provide a good level of security, and are easy to integrate with other services. Additionally, I want to have multiple targets to avoid vendor lock-in and to have a backup plan in case one of the targets fails. It would be nice if I could add other target types easily in the future.
Finally, I do not want that the data to leave the host unencrypted. Archive files should be encrypted before they leave the host using a secure, widely supported, and hopefully post-quantum secure encryption algorithm.
Solution
Consider following scenario:
- We have a PostgreSQL service running on a NixOS host,
- There are two databases that we want to backup:
db1
anddb2
, - We want to take a snapshot of these databases 15 minutes past every hour,
- We want to encrypt the snapshots using GPG with symmetric encryption and a passphrase,
- We want to archive the encrypted snapshots to two S3-compliant object stores:
s3://let-say-backblaze.com/bucket1
ands3://let-say-wasabi.com/bucket2
.
I learned from Abhinav's Note that services.postgresqlBackup
service creates systemd
services for each database backup, and we can add an ExecStartPost
entry. This entry will be our workhorse.
This is the Nix function that will do it:
## file: ./lib/pg-db-archive.nix
{ config
, pkgs
, dbName
, fileEncKey
, fileRclone
}:
let
script = pkgs.writeShellScriptBin "pg-db-archive" ''
#!/usr/bin/env bash
## Fail on any error:
set -e
## Define the backup directory path:
_dirBackup="${config.services.postgresqlBackup.location}"
## Define the path to the file to be encrypted:
_fileBackup="''${_dirBackup}/${dbName}.sql.gz"
## Define the path to the encrypted file to be archived:
_fileArchive="''${_dirBackup}/${dbName}_$(date --utc +%Y%m%dT%H%M%SZ).sql.gz.enc"
## Encrypt the file:
echo "Encrypting database dump..."
${pkgs.gnupg}/bin/gpg --symmetric --cipher-algo AES256 --batch --yes --passphrase-file ${fileEncKey} --output "''${_fileArchive}" "''${_fileBackup}"
echo "Database file is encrypted successfully."
## Archive the file:
echo "Archiving encrypted database dump..."
${pkgs.rclone}/bin/rclone --config "${fileRclone}" copy "''${_fileArchive}" "archive-target-database:/${dbName}/"
echo "Encrypted database file is archived successfully."
## Remove the local archive file:
echo "Removing local encrypted database dump..."
rm -f "''${_fileArchive}"
echo "Local encrypted database file is removed successfully."
'';
in
{
ExecStartPost = ''
${script}/bin/pg-db-archive
'';
}
We can then use this function in our configuration.nix
file:
## file: ./configuration.nix
{ ... }:
{
## ...
## Our typical PostgreSQL service:
services.postgresql = {
enable = true;
enableTCPIP = true;
ensureDatabases = [
"db1"
"db2"
];
};
## Setup the PostgreSQL backup service:
services.postgresqlBackup = {
enable = true;
startAt = "*-*-* *:15:00";
pgdumpOptions = "--no-owner";
databases = [
"db1"
"db2"
];
};
systemd.services."postgresqlBackup-db1".serviceConfig = (import ./lib/pg-db-archive.nix) {
config = config;
pkgs = pkgs;
dbName = "db1";
fileEncKey = "/run/secrets/keys/db1";
fileRclone = "/run/secrets/rclone/database-archive.conf";
};
systemd.services."postgresqlBackup-db2".serviceConfig = (import ./lib/pg-db-archive.nix) {
config = config;
pkgs = pkgs;
dbName = "db2";
fileEncKey = "/run/secrets/keys/db2";
fileRclone = "/run/secrets/rclone/database-archive.conf";
};
## ...
}
... whereby:
-
The contents of
/run/secrets/keys/{db1,db2}
are the passphrase for the GPG encryption:
hebele-hubele
-
The contents of
/run/secrets/rclone/database-archive.conf
are the configuration file for the rclone:
[archive-backblaze] type = s3 provider = Other env_auth = false acl = private no_check_bucket = true access_key_id = SOME_ACCESS_KEY_ID secret_access_key = SOME_SECRET_ACCESS_KEY endpoint = https://let-say-backblaze.com [archive-target-database-backblaze] type = alias remote = archive-backblaze:bucket1/databases [archive-wasabi] type = s3 provider = Other env_auth = false acl = private no_check_bucket = true access_key_id = ANOTHER_ACCESS_KEY_ID secret_access_key = ANOTHER_SECRET_ACCESS_KEY endpoint = https://let-say-wasabi.com [archive-target-database-wasabi] type = alias remote = archive-wasabi:bucket2/databases [archive-target-database] type = union action_policy = all create_policy = all search_policy = all upstreams = archive-target-database-backblaze:/ archive-target-database-wasabi:/
Once this configuration is activated, we can do following:
-
List
systemd
timers:
systemctl list-timers postgresqlBackup-*
-
Run the
systemd
backup and archive service for a specific database:
systemctl start postgresqlBackup-db1.service systemctl start postgresqlBackup-db2.service
-
See the service logs:
journalctl --unit postgresqlBackup-db1.service journalctl --unit postgresqlBackup-db2.service
Wrap-up
I have been using this recently and I am quite happy with the result. It is simple and manageable. This approach is also quite portable with very little effort.
In my actual setups, I use sops-nix (along with my opsops) to produce the GPG key and rclone configuration files. This way, I can manage the secrets in a secure and comfortable way.
The rclone configuration is quite flexible too: If I need another target, I can just add another remote and update the archive-target-database
union target. It will be automatically picked up by the pg-db-archive
script as it only relies on the archive-target-database
remote. Note that the new target can be any target supported by rclone.
Posted on July 30, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.