Elasticsearch: Snapshot and Restore with AWS S3

Elasticsearch provides very easy solution to backup and restore. For this tutorial, we will store the backup at AWS S3. We will take snapshots, restore the snapshot, and create a cron job to take snapshot daily in this tutorial. This tutorial is divided into the following sections:

Snapshot
Snapshot Repository
Registering Snapshot Repository
Taking Snapshot
Restoring From Snapshot
Monitoring Snapshot and Restore Progress
Daily Backup

Snapshot :

Snapshot is a backup taken from a running Elasticsearch cluster. We can take a snapshot of individual indices or of the entire cluster. Snapshots are incremental, which means each snapshot of an index only stores data that is not part of an earlier snapshot.

Snapshot Repository :

Snapshot repository is a container that stores snapshot. Snapshots can be stored in either local or remote repositories. Remote repositories can reside on AWS S3, HDFS, Azure, Google Cloud Storage, and other platforms supported by a repository plugin.

To retrieve information about all registered snapshot repositories:

curl -X GET "localhost:9200/_snapshot/_all?pretty"

Registering Snapshot Repository :

We must register a repository to take snapshots and restore from it. To register AWS S3 as a snapshot repository, we will follow the following steps:

AWS Setup
S3 Elasticsearch Plugin Installation
Snapshot Repository Registration

AWS Setup :

S3 Bucket: In this guide, we will create an S3 bucket named S3-BUCKET-NAME.
Custom Policy: We will create a custom policy with the following policy document:

{
  "Statement": [
    {
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation",
        "s3:ListBucketMultipartUploads",
        "s3:ListBucketVersions"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::S3-BUCKET-NAME"
      ]
    },
    {
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:AbortMultipartUpload",
        "s3:ListMultipartUploadParts"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::S3-BUCKET-NAME/*"
      ]
    }
  ],
  "Version": "2012-10-17"
}

IAM User: Then, we will create an IAM user attaching the custom policy. We need to collect the ACCESS_KEY_ID and SECRET_ACCESS_KEY.

S3 Elasticsearch Plugin Installation :

Install S3 plugin:

    cd /usr/share/elasticsearch
    sudo bin/elasticsearch-plugin install --batch repository-s3

For easy setup, set -Des.allow_insecure_settings=true to /etc/elasticsearch/jvm.options. For more secure setup, we can use elasticsearch-keystore.

Snapshot Repository Registration :
To store backup in this S3 bucket, we must need to register this bucket as a snapshot repository. We can register this bucket as snapshot registory from command line:

curl -X PUT "localhost:9200/_snapshot/REPOSITORY_NAME?pretty" -H 'Content-Type: application/json' -d'
{
 "type": "s3",
 "settings": {
   "bucket": "S3-BUCKET-NAME",
   "region": "AWS_REGION",
   "access_key": "ACCESS_KEY_ID",
   "secret_key": "SECRET_ACCESS_KEY"
 }
}
'

Taking Snapshot :

We can take a snapshot from running elasticsearch cluser by the following command, Here, SNAPSHOT_NAME is unique per REPOSITORY_NAME.

curl -X PUT "localhost:9200/_snapshot/REPOSITORY_NAME/SNAPSHOT_NAME?wait_for_completion=true&pretty" -H 'Content-Type:application/json' -d'
{
  "indices": "index_1,index_2",
  "ignore_unavailable": true,
  "include_global_state": false
 }
'

All snapshots currently stored in the repository can be listed using the following command:

curl -X GET "localhost:9200/_snapshot/my_backup/_all?pretty"

Restoring From Snapshot :

To restore indices from S3, we can do this following:

curl -X POST "localhost:9200/_snapshot/REPOSITORY_NAME/SNAPSHOT_NAME/_restore?pretty" -H 'Content-Type: application/json' -d'
{
  "indices": "index_1,index_2",
  "ignore_unavailable": true,
  "include_global_state": false,              
  "rename_pattern": "index_(.+)",
  "rename_replacement": "restored_index_$1",
  "include_aliases": false
}
'

Monitoring Snapshot and Restore Progress :

We can monitor the status of the snapshot by following command:

curl -X GET "localhost:9200/_snapshot/REPOSITORY_NAME/SNAPSHOT_NAME/_status?pretty"

Daily Backup :

Creating a Bash Script: We will create a bash script e.g. daily_elastic_search_backup.sh as the following:

#!/bin/bash

TODAY=$(date +'%Y.%m.%d')
echo Today $TODAY indices will be stored in S3.

ELASTIC_SEARCH_HOST="localhost"
ELASTIC_SEARCH_PORT="9200"
REPOSITORY_NAME="REPOSITORY_NAME"
SNAPSHOT_NAME="snapshot-"$TODAY

echo Starting Snapshot $SNAPSHOT_NAME

curl -X PUT "$ELASTIC_SEARCH_HOST:$ELASTIC_SEARCH_PORT/_snapshot/$REPOSITORY_NAME/$SNAPSHOT_NAME?wait_for_completion=true" -H 'Content-Type:application/json' -d'
{
  "indices": "index-1,index-2",
  "ignore_unavailable": true,
  "include_global_state": false
 }
'
echo Successfully completed storing "$SNAPSHOT_NAME" in S3

Adding the Script to Crontab: We can now add following line to crontab to backup every day 12am UTC:

0 0 * * * /home/ubuntu/daily_elastic_search_backup.sh > /home/ubuntu/daily_elastic_search_backup.log 2>&1

Blog