From Zero to Hero: Disaster Recovery for PostgreSQL with Streaming Replication in Kubernetes
Sergey Pronin
Posted on May 17, 2024
In today’s digital landscape, disaster recovery is essential for any business. As our dependence on data grows, the impact of system outages or data loss becomes more severe, leading to major business interruptions and financial setbacks.
Managing disaster recovery becomes even more complex with multi-cloud or multi-regional PostgreSQL deployments. Percona Operators offer a solution to simplify this process for PostgreSQL clusters running on Kubernetes. The Operator allows businesses to handle multi-cloud or hybrid-cloud PostgreSQL deployments effortlessly, ensuring that crucial data remains accessible and secure, no matter the circumstances.
This article will guide you through setting up disaster recovery using Percona Operator for PostgreSQL and streaming replication.
Design
The design is simple:
- Two sites - main and DR (disaster recovery).
- It can be two regions, data centers or even two namespaces
- In each site we have an Operator and a PostgreSQL cluster
- In the DR site the cluster is in Standby mode
- We set up streaming replication between these two clusters
Set it up
All examples in this blog post are as usual available in blog-data/pg-k8s-streaming-dr github repository.
Prerequisites:
- Kubernetes cluster or clusters (depending on your topology)
- Percona Operator for PostgreSQL deployed.
- See quickstart guides.
- Or just use the bundle.yaml that I have in the repository above:
kubectl apply -f https://raw.githubusercontent.com/spron-in/blog-data/master/pg-k8s-streaming-dr/bundle.yaml
Primary
The only specific thing for the Main cluster is that you need to expose it, so that standby can connect to the primary node. To expose the primary node, use the spec.expose
section:
spec:
...
expose:
type: ClusterIP
Use a Service type of your choice. In my case, I have two clusters in different namespaces, so ClusterIP
is sufficient. Deploy the cluster as usual:
kubectl apply -f main-cr.yaml -n main-pg
The service that you should use for connecting to standby is called <clustername>-ha
(main-ha
in my case):
main-ha ClusterIP 10.118.227.214 <none> 5432/TCP 163m
Standby
TLS certificates
To get the replication working, the Standby cluster would need to authenticate with the Main one. To get there, both clusters must have certificates signed by the same certificate authority (CA). Default replication user _crunchyrepl
will be used.
In the simplest case you can copy the certificates from the Main cluster. You need to look out for two files:
main-cluster-cert
main-replication-cert
Copy them to the namespace where DR cluster is going to be running and reference under spec.secrets
(I renamed them replacing main
with dr
):
spec:
secrets:
customTLSSecret:
name: dr-cluster-cert
customReplicationTLSSecret:
name: dr-replication-cert
If you are generating your own certificates, just remember the following rules:
- Certificates for both Main and Standby clusters must be signed by the same CA
-
customReplicationTLSSecret
must have a Common Name (CN) setting that matches_crunchyrepl
, which is a default replication user.
Read more about certificates in the documentation.
Configuration
Apart from setting certificates correctly, you should also set standby configuration.
standby:
enabled: true
host: main-ha.main-pg.svc
-
standby.enabled
controls if it is a standby cluster or not -
standby.host
must point to the primary node of a Main cluster. In my case it is amain-ha
service in another namespace.
Deploy the DR cluster:
kubectl apply -f dr-cr.yaml -n dr-pg
Verify
Once both clusters are up, you can verify that replication is working.
- Insert some data into Main cluster
- Connect to the DR cluster
To connect to the DR cluster, use the credentials that you used to connect to Main. This also verifies that the connection is working. You should see whatever data you have in the Main cluster in the Disaster Recovery.
Conclusion
Disaster recovery is crucial for maintaining business continuity in today's data-driven environment. Implementing a robust disaster recovery strategy for multi-cloud or multi-regional PostgreSQL deployments can be complex. However, the Percona Operator for PostgreSQL simplifies this process by enabling seamless management of PostgreSQL clusters on Kubernetes. By following the steps outlined in this article, you can set up disaster recovery using Percona Operator and streaming replication, ensuring your critical data remains secure and accessible. This approach not only provides peace of mind but also safeguards against significant business disruptions and financial losses.
At Percona, we are aiming to provide the best open source databases and tooling possible. As the next level of simplification and user experience for databases on Kubernetes, we recently released Percona Everest (currently in Beta). It is a cloud native database platform with a slick UI. It deploys and manages databases on Kubernetes for you without the need to look into YAML manifests.
Posted on May 17, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.