Production-Ready Terraform Module for Seamless Disaster Recovery: Primary and Secondary Clusters with Zero Downtime

aidudo

Nicholas Osi

Posted on November 18, 2024

Production-Ready Terraform Module for Seamless Disaster Recovery: Primary and Secondary Clusters with Zero Downtime

Creating a production-ready Terraform module for setting up a Disaster Recovery (DR) environment with primary and secondary clusters without downtime involves several components. This comprehensive guide provides you with a ready-to-use Terraform template that you can literally copy and deploy in your environment. The template is designed for AWS, but it can be adapted for other cloud providers with minimal changes.

Disclaimer: While this template is designed to be as plug-and-play as possible, it’s crucial to review and understand each component to ensure it aligns with your specific requirements and compliance standards.

  1. Prerequisites Before deploying the Terraform template, ensure you have the following:

Terraform Installed: Version 1.0 or later. (https://learn.hashicorp.com/tutorials/terraform/install-cli)
AWS Account: With necessary permissions to create resources.
AWS CLI Configured For authentication. (https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html)
Git Installed: To clone the repository (optional).

  1. Directory Structure Organize your Terraform code for maintainability and scalability. Here’s the recommended

terraform-dr-setup/
├── main.tf
├── variables.tf
├── outputs.tf
├── backend.tf
├── modules/
│ ├── networking/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── compute/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── database/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── s3_replication/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ └── route53_failover/
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
└── README.md

  1. Terraform Configuration Files Below are the detailed configurations for each component.

3.1 Provider and Backend Configuration

backend.tf`

terraform {
required_version = “>= 1.0”

backend “s3” {
bucket = “my-terraform-state-bucket”
key = “dr-setup/terraform.tfstate”
region = “us-east-1”
dynamodb_table = “terraform-lock-table”
encrypt = true
}
}

provider “aws” {
region = var.primary_region
}
`

provider “aws” {
alias = “secondary”
region = var.secondary_region
}

Explanation:
Backend: Uses AWS S3 for storing the Terraform state and DynamoDB for state locking.
Providers: Defines two AWS providers for primary and secondary regions.

3.2 Variables

variables.tf`

variable “primary_region” {
description = “Primary AWS region”
type = string
default = “us-east-1”
}

variable “secondary_region” {
description = “Secondary AWS region for DR”
type = string
default = “us-west-2”
}

variable “vpc_cidr_primary” {
description = “CIDR block for primary VPC”
type = string
default = “10.0.0.0/16”
}

variable “vpc_cidr_secondary” {
description = “CIDR block for secondary VPC”
type = string
default = “10.1.0.0/16”
}

variable “app_ami” {
description = “AMI ID for application servers”
type = string
default = “ami-0c55b159cbfafe1f0” # Example AMI
}

variable “instance_type” {
description = “EC2 instance type”
type = string
default = “t3.medium”
}

variable “db_engine” {
description = “Database engine”
type = string
default = “postgres”
}

variable “db_username” {
description = “Database admin username”
type = string
}

variable “db_password” {
description = “Database admin password”
type = string
sensitive = true
}

variable “s3_primary_bucket” {
description = “Primary S3 bucket name”
type = string
default = “my-app-primary-bucket”
}

variable “s3_secondary_bucket” {
description = “Secondary S3 bucket name”
type = string
default = “my-app-secondary-bucket”
}

variable “domain_name” {
description = “Domain name for Route 53”
type = string
default = “example.com”
}

variable “hosted_zone_id” {
description = “Route 53 Hosted Zone ID”
type = string
}

Explanation:
Defines all necessary variables with default values where applicable. Sensitive variables like db_password are marked accordingly.

3.3 Networking Module

Path: modules/networking/main.tf`

resource “aws_vpc” “this” {
cidr_block = var.vpc_cidr
enable_dns_support = true
enable_dns_hostnames = true
tags = {
Name = “${var.name}-vpc”
}
}

resource “aws_subnet” “public” {
count = 2
vpc_id = aws_vpc.this.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index)
availability_zone = element(data.aws_availability_zones.available.names, count.index)
map_public_ip_on_launch = true
tags = {
Name = “${var.name}-public-subnet-${count.index + 1}”
}
}

resource “aws_internet_gateway” “this” {
vpc_id = aws_vpc.this.id
tags = {
Name = “${var.name}-igw”
}
}

resource “aws_route_table” “public” {
vpc_id = aws_vpc.this.id
tags = {
Name = “${var.name}-public-rt”
}
}

resource “aws_route” “internet_access” {
route_table_id = aws_route_table.public.id
destination_cidr_block = “0.0.0.0/0”
gateway_id = aws_internet_gateway.this.id
}

resource “aws_route_table_association” “public” {
count = 2
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}

modules/networking/variables.tf`

variable “vpc_cidr” {
description = “CIDR block for the VPC”
type = string
}

variable “name” {
description = “Name prefix for resources”
type = string
}

modules/networking/outputs.tf`

output “vpc_id” {
description = “VPC ID”
value = aws_vpc.this.id
}

output “public_subnets” {
description = “List of public subnet IDs”
value = aws_subnet.public[*].id
}

Explanation:
Sets up a VPC with two public subnets, an Internet Gateway, and associated route tables. This setup is replicated in both primary and secondary regions.

3.4 Compute Module

Path: modules/compute/main.tf`

resource “aws_security_group” “app_sg” {
vpc_id = var.vpc_id
ingress {
from_port = 80
to_port = 80
protocol = “tcp”
cidr_blocks = [“0.0.0.0/0”]
}
ingress {
from_port = 22
to_port = 22
protocol = “tcp”
cidr_blocks = [“0.0.0.0/0”]
}
egress {
from_port = 0
to_port = 0
protocol = “-1”
cidr_blocks = [“0.0.0.0/0”]
}
tags = {
Name = “${var.name}-sg”
}
}

resource “aws_instance” “app” {
count = var.instance_count
ami = var.app_ami
instance_type = var.instance_type
subnet_id = element(var.subnet_ids, count.index)
security_groups = [aws_security_group.app_sg.name]

tags = {
Name = “${var.name}-app-${count.index + 1}”
}
}

modules/compute/variables.tf`

variable “vpc_id” {
description = “VPC ID”
type = string
}

variable “subnet_ids” {
description = “List of subnet IDs”
type = list(string)
}

variable “app_ami” {
description = “AMI ID for the application servers”
type = string
}

variable “instance_type” {
description = “EC2 instance type”
type = string
}

variable “instance_count” {
description = “Number of EC2 instances”
type = number
default = 2
}

variable “name” {
description = “Name prefix for resources”
type = string
}

modules/compute/outputs.tf`

output “app_instance_ids” {
description = “List of application EC2 instance IDs”
value = aws_instance.app[*].id
}

Explanation:
Deploys EC2 instances with a security group allowing HTTP and SSH access. The number of instances and other parameters are configurable.

3.5 Database Module

Path: modules/database/main.tf

resource “aws_db_subnet_group” “this” {
name = “${var.name}-db-subnet-group”
subnet_ids = var.subnet_ids
tags = {
Name = “${var.name}-db-subnet-group”
}
}

resource “aws_db_instance” “this” {
identifier = var.db_identifier
engine = var.db_engine
instance_class = var.db_instance_class
allocated_storage = 100
storage_type = “gp2”
engine_version = “13.3”
name = var.db_name
username = var.db_username
password = var.db_password
db_subnet_group_name = aws_db_subnet_group.this.name
vpc_security_group_ids = [var.sg_id]
multi_az = var.multi_az
publicly_accessible = false
skip_final_snapshot = true
backup_retention_period = 7
tags = {
Name = “${var.name}-db”
}

Replication for DR
replicate_source_db = var.replicate_source_db
}
`

modules/database/variables.tf

variable “subnet_ids” {
description = “List of subnet IDs”
type = list(string)
}

variable “sg_id” {
description = “Security Group ID for the database”
type = string
}

variable “db_engine” {
description = “Database engine”
type = string
}

variable “db_instance_class” {
description = “Database instance class”
type = string
default = “db.t3.medium”
}

variable “db_identifier” {
description = “Database identifier”
type = string
}

variable “db_name” {
description = “Database name”
type = string
}

variable “db_username” {
description = “Database admin username”
type = string
}

variable “db_password” {
description = “Database admin password”
type = string
sensitive = true
}

variable “multi_az” {
description = “Enable Multi-AZ deployment”
type = bool
default = true
}

variable “name” {
description = “Name prefix for resources”
type = string
}

variable “replicate_source_db” {
description = “ARN of the source DB instance for replication”
type = string
default = null
}

modules/database/outputs.tf

output “db_instance_endpoint” {
description = “Database instance endpoint”
value = aws_db_instance.this.endpoint
}

output “db_instance_id” {
description = “Database instance ID”
value = aws_db_instance.this.id
}

Explanation:
Creates an RDS PostgreSQL instance with Multi-AZ for high availability. In the secondary region, it sets up the database as a read replica by specifying the replicate_source_db.

3.6 S3 Bucket Replication Module
Path: modules/s3_replication/main.tf

resource “aws_s3_bucket” “source” {
bucket = var.source_bucket
acl = “private”

versioning {
enabled = true
}

replication_configuration {
role = aws_iam_role.replication_role.arn

rules {
id = “replicate-all”
status = “Enabled”

filter {
prefix = “”
}

destination {
bucket = “arn:aws:s3:::${var.destination_bucket}”
storage_class = “STANDARD”
}
}
}

tags = {
Name = var.source_bucket
}
}

resource “aws_s3_bucket” “destination” {
provider = aws.secondary
bucket = var.destination_bucket
acl = “private”

versioning {
enabled = true
}

tags = {
Name = var.destination_bucket
}
}

resource “aws_iam_role” “replication_role” {
name = “${var.name}-s3-replication-role”

assume_role_policy = jsonencode({
Version = “2012–10–17”
Statement = [{
Action = “sts:AssumeRole”
Effect = “Allow”
Principal = {
Service = “s3.amazonaws.com”
}
}]
})

managed_policy_arns = [
“arn:aws:iam::aws:policy/service-role/AmazonS3ReplicationServiceRole”
]
}

modules/s3_replication/variables.tf

variable “source_bucket” {
description = “Source S3 bucket name”
type = string
}

variable “destination_bucket” {
description = “Destination S3 bucket name”
type = string
}

variable “name” {
description = “Name prefix for resources”
type = string
}

modules/s3_replication/outputs.tf

output “source_bucket_id” {
description = “Source S3 bucket ID”
value = aws_s3_bucket.source.id
}

output “destination_bucket_id” {
description = “Destination S3 bucket ID”
value = aws_s3_bucket.destination.id
}

Explanation:
Sets up S3 bucket replication from the primary to the secondary region. It creates both source and destination buckets with versioning enabled and configures replication rules.

3.7 Route 53 Failover Configuration

Path: modules/route53_failover/main.tf

resource “aws_route53_health_check” “primary_health” {
fqdn = var.primary_fqdn
type = “HTTP”
resource_path = “/health”
failure_threshold = 3
request_interval = 30
}

resource “aws_route53_record” “primary” {
zone_id = var.zone_id
name = var.record_name
type = “A”

set_identifier = “primary”
weight = 100

alias {
name = var.primary_elb_dns
zone_id = var.primary_elb_zone_id
evaluate_target_health = true
}

health_check_id = aws_route53_health_check.primary_health.id

failover_routing_policy {
type = “PRIMARY”
}
}

resource “aws_route53_record” “secondary” {
zone_id = var.zone_id
name = var.record_name
type = “A”

set_identifier = “secondary”
weight = 100

alias {
name = var.secondary_elb_dns
zone_id = var.secondary_elb_zone_id
evaluate_target_health = true
}

failover_routing_policy {
type = “SECONDARY”
}
}

modules/route53_failover/variables.tf

variable “zone_id” {
description = “Route 53 Hosted Zone ID”
type = string
}

variable “record_name” {
description = “DNS record name”
type = string
}

variable “primary_fqdn” {
description = “Primary application FQDN for health checks”
type = string
}

variable “primary_elb_dns” {
description = “Primary ELB DNS name”
type = string
}

variable “primary_elb_zone_id” {
description = “Primary ELB Hosted Zone ID”
type = string
}

variable “secondary_elb_dns” {
description = “Secondary ELB DNS name”
type = string
}

variable “secondary_elb_zone_id” {
description = “Secondary ELB Hosted Zone ID”
type = string
}

modules/route53_failover/outputs.tf

output “primary_health_check_id” {
description = “Primary health check ID”
value = aws_route53_health_check.primary_health.id
}

Explanation:
Configures Route 53 DNS failover with health checks. If the primary ELB fails the health check, traffic is routed to the secondary ELB.

3.8 Outputs
outputs.tf

output “primary_vpc_id” {
description = “Primary VPC ID”
value = module.networking_primary.vpc_id
}

output “secondary_vpc_id” {
description = “Secondary VPC ID”
value = module.networking_secondary.vpc_id
}

output “primary_app_instances” {
description = “Primary application EC2 instances”
value = module.compute_primary.app_instance_ids
}

output “secondary_app_instances” {
description = “Secondary application EC2 instances”
value = module.compute_secondary.app_instance_ids
}

output “primary_db_endpoint” {
description = “Primary DB Endpoint”
value = module.database_primary.db_instance_endpoint
}

output “secondary_db_endpoint” {
description = “Secondary DB Endpoint”
value = module.database_secondary.db_instance_endpoint
}

output “s3_primary_bucket” {
description = “Primary S3 Bucket”
value = module.s3_replication_primary.source_bucket_id
}

output “s3_secondary_bucket” {
description = “Secondary S3 Bucket”
value = module.s3_replication_primary.destination_bucket_id
}
`

Explanation:
Exports essential information about the deployed resources, such as VPC IDs, EC2 instance IDs, database endpoints, and S3 bucket IDs.

  1. Deploying the Terraform Template

Follow these steps to deploy the DR setup using the provided Terraform template.

4.1 Clone the Repository

git clone https://github.com/your-repo/terraform-dr-setup.git
cd terraform-dr-setup

Note: Replace https://github.com/your-repo/terraform-dr-setup.git with your actual repository URL if applicable.

4.2 Initialize Terraform

Initialize the Terraform working directory, download plugins, and configure the backend.

terraform init

4.3 Review the Plan

Generate and review the execution plan to ensure resources are created as expected.

terraform plan -var=”db_username=admin” - var=”db_password=yourpassword” -var=”hosted_zone_id=Z1234567890"

Replace yourpassword with a secure password and Z1234567890 with your actual Route 53 Hosted Zone ID.

4.4 Apply the Configuration

Apply the Terraform configuration to create the resources.

terraform apply -var=”db_username=admin” -var=”db_password=yourpassword” -var=”hosted_zone_id=Z1234567890" -auto-approve

Warning: The -auto-approve flag skips the confirmation prompt. Remove it if you prefer manual approval.

  1. Testing the DR Setup After deployment, it’s essential to test the DR setup to ensure failover works seamlessly.

5.1 Verify Resource Creation

  • VPCs: Ensure both primary and secondary VPCs are created.
  • EC2 Instances: Check that EC2 instances are running in both regions.
  • RDS Instances: Confirm that the secondary RDS instance is a read replica.
  • S3 Buckets: Verify that replication is configured between primary and secondary buckets.
  • Route 53: Ensure DNS records are set up with failover policies.

5.2 Simulate Failover

  1. Primary Application Down:
    — Stop or terminate primary EC2 instances or the ELB.

  2. Health Check Failure:
    — Ensure Route 53 detects the failure via health checks.

  3. Traffic Routing:
    — Verify that traffic is routed to the secondary ELB without downtime.

  4. Data Consistency:
    — Check that data in the secondary database and S3 bucket is up-to-date.

5.3 Restore Primary Services

Once testing is complete, restore the primary services and ensure Route 53 redirects traffic back if primary services are healthy.

  1. Maintenance and Best Practices To ensure the DR setup remains robust and secure, follow these best practices:

6.1 Regular Updates

  • Terraform: Keep Terraform updated to the latest version.
  • AWS Services: Monitor and apply updates to AWS services and configurations.

6.2 Monitoring and Alerts

  • Implement monitoring using AWS CloudWatch or other monitoring tools.
  • Set up alerts for critical events, such as failovers or resource failures.

6.3 Security Management

  • Regularly rotate database passwords and access keys.
  • Implement IAM best practices, granting least privilege.

6.4 Cost Management

  • Monitor AWS costs to avoid unexpected charges.
  • Utilize AWS Cost Explorer and budgeting tools.

6.5 Documentation

  • Maintain up-to-date documentation of the infrastructure and DR procedures.
  • Document any changes made to the Terraform configuration.
  1. Complete Terraform Code For your convenience, here’s the complete Terraform code structured as described above. You can copy and use it directly, ensuring you adjust variables like db_username, db_password, and hosted_zone_id as needed.

7.1 Root Module Files

main.tf
module “networking_primary” {
source = “./modules/networking”
vpc_cidr = var.vpc_cidr_primary
name = “primary”
}

module “networking_secondary” {
source = “./modules/networking”
providers = { aws = aws.secondary }
vpc_cidr = var.vpc_cidr_secondary
name = “secondary”
}

module “compute_primary” {
source = “./modules/compute”
vpc_id = module.networking_primary.vpc_id
subnet_ids = module.networking_primary.public_subnets
app_ami = var.app_ami
instance_type = var.instance_type
instance_count = 2
name = “primary”
}

module “compute_secondary” {
source = “./modules/compute”
providers = { aws = aws.secondary }
vpc_id = module.networking_secondary.vpc_id
subnet_ids = module.networking_secondary.public_subnets
app_ami = var.app_ami
instance_type = var.instance_type
instance_count = 2
name = “secondary”
}

module “database_primary” {
source = “./modules/database”
subnet_ids = module.networking_primary.public_subnets
sg_id = module.compute_primary.app_security_group_id
db_engine = var.db_engine
db_instance_class = “db.t3.medium”
db_identifier = “primary-db”
db_name = “appdb”
db_username = var.db_username
db_password = var.db_password
multi_az = true
name = “primary”
}

module “database_secondary” {
source = “./modules/database”
providers = { aws = aws.secondary }
subnet_ids = module.networking_secondary.public_subnets
sg_id = module.compute_secondary.app_security_group_id
db_engine = var.db_engine
db_instance_class = “db.t3.medium”
db_identifier = “secondary-db”
db_name = “appdb”
db_username = var.db_username
db_password = var.db_password
multi_az = true
name = “secondary”
replicate_source_db = module.database_primary.db_instance_id
}

module “s3_replication_primary” {
source = “./modules/s3_replication”
source_bucket = var.s3_primary_bucket
destination_bucket = var.s3_secondary_bucket
name = “s3-replication”
}

module “route53_failover” {
source = “./modules/route53_failover”
zone_id = var.hosted_zone_id
record_name = “app.${var.domain_name}”
primary_fqdn = “app.primary.${var.domain_name}”
primary_elb_dns = module.compute_primary.app_elb_dns
primary_elb_zone_id = module.compute_primary.app_elb_zone_id
secondary_elb_dns = module.compute_secondary.app_elb_dns
secondary_elb_zone_id = module.compute_secondary.app_elb_zone_id
}

variables.tf

[As defined earlier]

outputs.tf

[As defined earlier]

backend.tf

[As defined earlier]

7.2 Modules
For brevity, only key components of each module are shown. Ensure each module (networking, compute, database, s3_replication, route53_failover) contains the respective main.tf, variables.tf, and outputs.tf as outlined in sections 3.3 to 3.7.

Conclusion
This comprehensive Terraform module template provides a robust foundation for setting up a Disaster Recovery environment with primary and secondary clusters on AWS. By following this guide, you can deploy a resilient infrastructure designed to handle failovers seamlessly without downtime.

Next Steps:

  1. Customize Variables: Adjust variables like db_username, db_password, and hosted_zone_id to match your environment.
  2. Secure Secrets: Consider using Terraform’s Sensitive Variables or integrating with secret management tools like AWS Secrets Manager or HashiCorp Vault.
  3. Enhance Security: Implement additional security measures such as restricting SSH access, enabling encryption for data at rest and in transit, and configuring IAM roles with least privilege.
  4. Automate Deployments: Integrate this Terraform setup into your CI/CD pipelines for automated deployments and updates.
  5. Continuous Monitoring: Set up comprehensive monitoring and alerting to proactively manage the health of your infrastructure.

By leveraging Terraform’s infrastructure as code capabilities, you can maintain consistency, reproducibility, and scalability in your Disaster Recovery strategy, ensuring high availability and business continuity.

💖 💪 🙅 🚩
aidudo
Nicholas Osi

Posted on November 18, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related