Your Current Backup Automation Is Missing the Key: Ansible, AWS S3, and Cron 🔑
Niran
Posted on October 20, 2024
Your current backup automation is missing the key: Ansible, AWS S3, and Cron. Data backups are a crucial part of any system or application, ensuring that you don't lose important files when things go wrong. While you may already have an automation process in place, manual backups can still be time-consuming and prone to human error.
That's where enhancing your automation comes in! In this blog, I'll show you how to supercharge your existing backup process by automating the backup of critical files and application data to AWS S3 using Ansible. We'll also set up Cron Jobs to handle the scheduling, ensuring your backups run smoothly and on time. Additionally, we'll cover how to restore those backups in case you ever need to retrieve them.
To set up the automated backup and restore process using Ansible, AWS S3, and Cron Jobs, you need to install and configure the following components:
AWS Account:
- Ensure you create an AWS account if you don't have one.
AWS CLI:
- Install the AWS Command Line Interface (CLI) to manage your AWS services from the command line.
- After installing, configure the AWS CLI with your AWS credentials.
aws configure
- You’ll need your AWS Access Key ID, Secret Access Key, default region, and output format.
Ansible:
- Install Ansible on your Linux server.
Python and Boto3:
- Ansible requires Python to run, so ensure it's installed.
- Install Boto3 (the AWS SDK for Python), which Ansible uses for AWS-related tasks.
pip3 install boto3
If you encounter the "externally-managed-environment" error when trying to install boto3 using pip3, follow these steps to resolve the issue:
- Install the Virtual Environment Package: First, ensure you have the necessary package to create virtual environments. Run the following command:
sudo apt install python3-venv
- Create a Virtual Environment: Use the following command to create a new virtual environment. You can name the folder anything you like; in this example, we'll use myenv:
python3 -m venv myenv
- Activate the Virtual Environment: Activate the virtual environment with this command:
source myenv/bin/activate
Your terminal prompt will change to indicate that you are now working inside the virtual environment.
- Install boto3 Using pip3: Now, you can install boto3 without encountering the previous error. Run:
pip3 install boto3
- Verify the Installation: After installation, you can check if boto3 was installed correctly by running:
pip3 show boto3
- Deactivate the Virtual Environment: Once you're done working, you can exit the virtual environment by running:
deactivate
Ansible Collections:
- Install the necessary Ansible collections for AWS to ensure smooth integration with your cloud environment:
ansible-galaxy collection install amazon.aws
Cron:
- Cron is typically pre-installed on Linux systems. You can check if it's running with:
systemctl status cron
- If it's not installed, you can install it:
sudo apt install cron -y
Additional Tools (Optional):
- You might consider installing tools like gzip or tar if they're not already installed, as these are often used for compressing files.
With all the necessary components installed and configured, here’s your next step:
Create S3 Bucket via AWS CLI:
aws s3 mb s3://my-app-backups --region us-east-1
S3 Bucket Permissions:
- Read Access: The IAM user or role used in your Ansible playbook must have permission to list and read objects from the specified S3 bucket. This is typically done through IAM policies. For example:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": "arn:aws:s3:::my-app-backups"
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": "arn:aws:s3:::my-app-backups/*"
}
]
}
Replace my-app-backups with your desired bucket name, and adjust the region if needed.
Once the bucket is created, you can begin uploading files automatically using Ansible playbooks. Let’s start by creating the playbooks:
You’ll need two playbooks:
- One for backing up files to AWS S3.
- One for restoring the files from AWS S3.
Example: Backup Playbook (backup.yml)
This playbook compresses a directory and uploads the tarball to your S3 bucket.
---
# Playbook to back up the /etc/myapp directory to AWS S3.
- name: Backup /etc/myapp to AWS S3
# Defines the play name, indicating its purpose.
hosts: localhost
# Specifies that the play will run on the local machine (localhost).
tasks:
# The list of tasks to be executed.
- name: Compress /etc/myapp directory
# Task to compress the directory into a tarball.
archive:
path: /etc/myapp
# Source directory that will be compressed.
dest: /tmp/myapp_backup.tar.gz
# Destination path for the compressed tarball in the /tmp directory.
- name: Upload tarball to AWS S3
# Task to upload the tarball to an S3 bucket.
aws_s3:
bucket: my-app-backups
# Name of the S3 bucket where the backup will be stored.
object: backups/myapp_backup_{{ ansible_date_time.iso8601 }}.tar.gz
# Specifies the object key (file path) in the S3 bucket. It includes the current date and time in ISO 8601 format to make the filename unique.
src: /tmp/myapp_backup.tar.gz
# The source file (the tarball) to be uploaded.
mode: put
# Specifies that this operation is a 'put' (upload) to the S3 bucket.
delegate_to: localhost
# Ensures the upload is performed from the localhost.
Example: Restore Playbook (restore.yml)
This playbook downloads the latest backup and restores it to the original location.
---
- name: Restore files from S3 bucket to local
hosts: localhost
gather_facts: no
tasks:
- name: Ensure the restore directory exists
file:
path: /home/niran/restore
# Specify the path for the restore directory
state: directory
# Ensure the directory is created if it doesn't exist
- name: List objects in the S3 bucket
amazon.aws.s3_object:
bucket: my-app-backups
# Specify the S3 bucket to list objects from
mode: list
# Set the mode to list objects
register: s3_objects
# Register the output to use in later tasks
- name: Debug full S3 objects output
debug:
var: s3_objects
# Display the full output of the S3 objects for debugging purposes
- name: Download files from S3 bucket to local
amazon.aws.s3_object:
bucket: my-app-backups
# Specify the S3 bucket to download files from
object: "{{ item }}"
# Use the item from the loop to specify the object to download
dest: "/home/niran/restore/{{ item | basename }}"
# Set the destination for downloaded files
mode: get
# Set the mode to download files
loop: "{{ s3_objects.s3_keys }}"
# Loop through the keys of the S3 objects
when: s3_objects.s3_keys is defined and s3_objects.s3_keys | length > 0
# Only run if there are keys available
Before setting up the Cron job, manually run the playbooks to ensure they work as expected:
Backup Test
Run the backup playbook:
ansible-playbook /path/to/backup.yml
Verify that the compressed file is created and uploaded to S3.
Restore Test
Run the restore playbook:
ansible-playbook /path/to/restore.yml
Verify whether the restored files have been downloaded.
Note: Ensure that the necessary permissions are set for both the S3 bucket and the local file system to avoid any access issues during the restore process.
Set Up the Cron Job:
- Open the Crontab file for editing:
crontab -e
- Add the Cron job line at the end of the file, for example:
30 11 * * * ansible-playbook /path/to/backup.yml
- Save and exit the editor.
- You can verify that your Cron job has been set up correctly by listing the current Cron jobs:
crontab -l
Note:Make sure to replace /path/to/backup.yml with the actual path to your Ansible backup playbook. Also, ensure that the user executing the Cron job has the necessary permissions to run Ansible and access the required files and directories.
By automating backup and restore processes with Ansible, AWS S3, and Cron jobs, we’ve ensured reliable and consistent management of critical data. This approach not only simplifies complex tasks but also provides an efficient, scalable, and repeatable solution that can be adapted for various use cases. With daily backups scheduled via Cron and seamless restores handled through Ansible playbooks, your data management processes can be fully automated, saving both time and reducing the risk of human error.
For those looking to implement this solution, you can find the full project repository on GitHub here.
~!@#$%^&*()_+
Happy automating!
Posted on October 20, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
October 20, 2024