Elasticsearch: Snapshot and Restore with AWS S3

mhihasan

Hasanul Islam

Posted on June 30, 2020

Elasticsearch: Snapshot and Restore with AWS S3

Elasticsearch provides very easy solution to backup and restore. For this tutorial, we will store the backup at AWS S3. We will take snapshots, restore the snapshot, and create a cron job to take snapshot daily in this tutorial. This tutorial is divided into the following sections:

Snapshot :

Snapshot is a backup taken from a running Elasticsearch cluster. We can take a snapshot of individual indices or of the entire cluster. Snapshots are incremental, which means each snapshot of an index only stores data that is not part of an earlier snapshot.

Snapshot Repository :

Snapshot repository is a container that stores snapshot. Snapshots can be stored in either local or remote repositories. Remote repositories can reside on AWS S3, HDFS, Azure, Google Cloud Storage, and other platforms supported by a repository plugin.

To retrieve information about all registered snapshot repositories:

curl -X GET "localhost:9200/_snapshot/_all?pretty"
Enter fullscreen mode Exit fullscreen mode

Registering Snapshot Repository :

We must register a repository to take snapshots and restore from it. To register AWS S3 as a snapshot repository, we will follow the following steps:

AWS Setup :

  • S3 Bucket: In this guide, we will create an S3 bucket named S3-BUCKET-NAME.
  • Custom Policy: We will create a custom policy with the following policy document:
{
  "Statement": [
    {
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation",
        "s3:ListBucketMultipartUploads",
        "s3:ListBucketVersions"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::S3-BUCKET-NAME"
      ]
    },
    {
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:AbortMultipartUpload",
        "s3:ListMultipartUploadParts"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::S3-BUCKET-NAME/*"
      ]
    }
  ],
  "Version": "2012-10-17"
}
Enter fullscreen mode Exit fullscreen mode
  • IAM User: Then, we will create an IAM user attaching the custom policy. We need to collect the ACCESS_KEY_ID and SECRET_ACCESS_KEY.

S3 Elasticsearch Plugin Installation :

  • Install S3 plugin:
    cd /usr/share/elasticsearch
    sudo bin/elasticsearch-plugin install --batch repository-s3
Enter fullscreen mode Exit fullscreen mode
  • For easy setup, set -Des.allow_insecure_settings=true to /etc/elasticsearch/jvm.options. For more secure setup, we can use elasticsearch-keystore.

Snapshot Repository Registration :
To store backup in this S3 bucket, we must need to register this bucket as a snapshot repository. We can register this bucket as snapshot registory from command line:

curl -X PUT "localhost:9200/_snapshot/REPOSITORY_NAME?pretty" -H 'Content-Type: application/json' -d'
{
 "type": "s3",
 "settings": {
   "bucket": "S3-BUCKET-NAME",
   "region": "AWS_REGION",
   "access_key": "ACCESS_KEY_ID",
   "secret_key": "SECRET_ACCESS_KEY"
 }
}
'
Enter fullscreen mode Exit fullscreen mode

Taking Snapshot :

We can take a snapshot from running elasticsearch cluser by the following command, Here, SNAPSHOT_NAME is unique per REPOSITORY_NAME.

curl -X PUT "localhost:9200/_snapshot/REPOSITORY_NAME/SNAPSHOT_NAME?wait_for_completion=true&pretty" -H 'Content-Type:application/json' -d'
{
  "indices": "index_1,index_2",
  "ignore_unavailable": true,
  "include_global_state": false
 }
'
Enter fullscreen mode Exit fullscreen mode

All snapshots currently stored in the repository can be listed using the following command:

curl -X GET "localhost:9200/_snapshot/my_backup/_all?pretty"

Enter fullscreen mode Exit fullscreen mode

Restoring From Snapshot :

To restore indices from S3, we can do this following:

curl -X POST "localhost:9200/_snapshot/REPOSITORY_NAME/SNAPSHOT_NAME/_restore?pretty" -H 'Content-Type: application/json' -d'
{
  "indices": "index_1,index_2",
  "ignore_unavailable": true,
  "include_global_state": false,              
  "rename_pattern": "index_(.+)",
  "rename_replacement": "restored_index_$1",
  "include_aliases": false
}
'
Enter fullscreen mode Exit fullscreen mode

Monitoring Snapshot and Restore Progress :

We can monitor the status of the snapshot by following command:

curl -X GET "localhost:9200/_snapshot/REPOSITORY_NAME/SNAPSHOT_NAME/_status?pretty"
Enter fullscreen mode Exit fullscreen mode

Daily Backup :

  • Creating a Bash Script: We will create a bash script e.g. daily_elastic_search_backup.sh as the following:
#!/bin/bash

TODAY=$(date +'%Y.%m.%d')
echo Today $TODAY indices will be stored in S3.

ELASTIC_SEARCH_HOST="localhost"
ELASTIC_SEARCH_PORT="9200"
REPOSITORY_NAME="REPOSITORY_NAME"
SNAPSHOT_NAME="snapshot-"$TODAY

echo Starting Snapshot $SNAPSHOT_NAME

curl -X PUT "$ELASTIC_SEARCH_HOST:$ELASTIC_SEARCH_PORT/_snapshot/$REPOSITORY_NAME/$SNAPSHOT_NAME?wait_for_completion=true" -H 'Content-Type:application/json' -d'
{
  "indices": "index-1,index-2",
  "ignore_unavailable": true,
  "include_global_state": false
 }
'
echo Successfully completed storing "$SNAPSHOT_NAME" in S3
Enter fullscreen mode Exit fullscreen mode
  • Adding the Script to Crontab: We can now add following line to crontab to backup every day 12am UTC:
0 0 * * * /home/ubuntu/daily_elastic_search_backup.sh > /home/ubuntu/daily_elastic_search_backup.log 2>&1
Enter fullscreen mode Exit fullscreen mode
💖 💪 🙅 🚩
mhihasan
Hasanul Islam

Posted on June 30, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related