How to migrate Apache Solr from the existing cluster to Amazon EKS

haintkit

Hai Nguyen

Posted on August 31, 2023

How to migrate Apache Solr from the existing cluster to Amazon EKS

Co-author: @coangha21

Solr is an open-source enterprise-search platform, written in Java. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration, NoSQL features and rich document (e.g., Word, PDF) handling. Providing distributed search and index replication, Solr is designed for scalability and fault tolerance. Solr is widely used for enterprise search and analytics use cases and has an active development community and regular releases.

Solr runs as a standalone full-text search server. It uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it usable from most popular programming languages. Solr's external configuration allows it to be tailored to many types of applications without Java coding, and it has a plugin architecture to support more advanced customization.

In this article, I'll walk through the process of migrate Solr from a Kubernetes cluster to Amazon EKS (Elastic Kubernetes Service) using backup and restore method. Please take into account, depend on your Solr's version, system requirement, circumstance, etc, you will need to extra setup on EFS to ensure your K8s cluster is able to access to network file system on AWS. This will not be required if your Solr’s version can use S3 as backup repository. Please refer to the links below:

For the purpose of demonstrating, I'll migrate Solr from an EKS cluster to another one within the same region. However, you can apply this migration method to any Kubernetes cluster running on any platform.

Prerequisite:

Before you begin, make sure you have the following available:

  • AWS account and required permission to create resources

  • Terraform or AWS CLI, kubectl and helm installed on your machine

The following step is what we will do in this article:

Step 1: Create target EKS cluster.

Step 2: Install Solr using Helm.

Step 3: Setup Solr backup storage.

Step 4: Create backup from origin cluster.

Step 5: Restore Solr to EKS using backup.

Let's go in details!

Create target EKS cluster

There are many ways to create a cluster such as using eksctl. In my case, I will use terraform module cause it’s easy to reuse and comprehend.

This is my Terraform code template to create the cluster. You can just copy and run it or customize based on your desired configurations:



provider "aws" {
  region = "ap-southeast-1"
  default_tags {
    tags = {
      environment = "Dev"
    }
  }
}

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
  token                  = data.aws_eks_cluster_auth.this.token
}

provider "helm" {
  kubernetes {
    host                   = module.eks.cluster_endpoint
    cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
    token                  = data.aws_eks_cluster_auth.this.token
  }
}

provider "kubectl" {
  apply_retry_count      = 10
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
  load_config_file       = false
  token                  = data.aws_eks_cluster_auth.this.token
}

data "aws_eks_cluster_auth" "this" {
  name = module.eks.cluster_name
}

data "aws_availability_zones" "available" {} 

locals {
  region = "ap-southeast-1"

  vpc_cidr = "10.0.0.0/16"
  azs      = slice(data.aws_availability_zones.available.names, 0, 3)
}

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 19.12"

  ## EKS Cluster Config
  cluster_name       = "solr-demo"
  cluster_version    = "1.25"

  ## VPC Config
  vpc_id                   = module.vpc.vpc_id
  subnet_ids               = module.vpc.private_subnets

  # EKS Cluster Network Config
  cluster_endpoint_private_access      = true
  cluster_endpoint_public_access       = true

  ## EKS Worker
  eks_managed_node_groups  = {
    "solr-nodegroup" = {
      node_group_name    = "solr_managed_node_group"
      # launch_template_os = "amazonlinux2eks"
      public_ip          = false
      pre_userdata       = <<-EOF
          yum install -y amazon-ssm-agent
          systemctl enable amazon-ssm-agent && systemctl start amazon-ssm-agent
        EOF
      desired_size       = 2
      ami_type           = "AL2_x86_64"
      capacity_type      = "ON_DEMAND"
      instance_types     = ["t3.medium"]
      disk_size          = 30
    }
  }
}

module "eks_blueprints_addons_common" {
  source  = "aws-ia/eks-blueprints-addons/aws"
  version = "~> 1.3.0"

  cluster_name      = module.eks.cluster_name
  cluster_endpoint  = module.eks.cluster_endpoint
  cluster_version   = module.eks.cluster_version
  oidc_provider_arn = module.eks.oidc_provider_arn

  create_delay_dependencies = [for ng in module.eks.eks_managed_node_groups: ng.node_group_arn]

  eks_addons = {
    aws-ebs-csi-driver = {
      service_account_role_arn = module.ebs_csi_driver_irsa.iam_role_arn
    }
    vpc-cni = {
      service_account_role_arn = module.aws_node_irsa.iam_role_arn
    }
    coredns = {
    }
    kube-proxy = {
    }
  }
  enable_aws_efs_csi_driver = true
}

## Resource for VPC CNI Addon
module "aws_node_irsa" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  version = "~> 5.20"

  role_name_prefix = "${module.eks.cluster_name}-aws-node-"

  attach_vpc_cni_policy = true
  vpc_cni_enable_ipv4   = true

  oidc_providers = {
    main = {
      provider_arn               = module.eks.oidc_provider_arn
      namespace_service_accounts = ["kube-system:aws-node"]
    }
  }
}

module "ebs_csi_driver_irsa" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  version = "~> 5.20"

  role_name_prefix = "${module.eks.cluster_name}-ebs-csi-driver-"

  attach_ebs_csi_policy = true

  oidc_providers = {
    main = {
      provider_arn               = module.eks.oidc_provider_arn
      namespace_service_accounts = ["kube-system:ebs-csi-controller-sa"]
    }
  }
}

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"

  name = "solr-demo-subnet"
  cidr = local.vpc_cidr

  azs             = local.azs
  private_subnets = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 4, k)]
  public_subnets  = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 8, k + 48)]

  enable_nat_gateway = true
  single_nat_gateway = true

  public_subnet_tags = {
    "kubernetes.io/role/elb" = 1
  }

  private_subnet_tags = {
    "kubernetes.io/role/internal-elb" = 1
  }
}


Enter fullscreen mode Exit fullscreen mode

The following AWS resources will be created:

  • VPC with private and public subnets.

  • EKS cluster along with a node group (t3.medium x 01) and EKS adds-on (aws-ebs-csi-driver, vpc-cni, coredns, kube-proxy).

You can check AWS resources in AWS management console:

EKS Cluster

EKS Adds-on

Install Solr using helm

Next step is to install Solr using helm:



### Install the Solr & Zookeeper CRDs
helm repo add apache-solr https://solr.apache.org/charts
helm repo update

### Install the Solr operator and Zookeeper Operator
kubectl create -f https://solr.apache.org/operator/downloads/crds/<version>/all-with-dependencies.yaml
helm install solr-operator apache-solr/solr-operator --version <version>

### Install the Solr, zookeeper
helm install solr apache-solr/solr -n solr --version <version>


Enter fullscreen mode Exit fullscreen mode

Replace version with your chart version or chart version which contain your Solr’s version.

Next run this command to get admin’s password and access to Solr UI



### Get solr password
kubectl get secret solrcloud-security-bootstrap -n solr -o jsonpath='{.data.admin}' | base64 --decode
### Port forward Solr UI
kubectl port-forward service/solrcloud-common 3000:80 -n solr


Enter fullscreen mode Exit fullscreen mode

Now open your browser and type http://localhost:3000, the result should be:

Image description
Then using the admin’s password we get above to login.

Setup Solr backup storage
After you have Solr installation done, it is time to setup Solr backup storage. At the time of writing this post, AWS supports 2 backup storage types: EFS and S3. Depending on your Solr’s version and system requirement, you can choose either of them. In this demo, I’ll use EFS as backup storage since this storage type is compatible to most Solr’s version. For more information, please visit this link.

To setup EFS as Solr’s backup storage, you need to create an EFS in AWS. This terraform code template will create EFS resource:



module "efs" {
  source  = "terraform-aws-modules/efs/aws"
  version = "1.2.0"

  # File system
  name           = "solr-backup-storage"

  performance_mode = "generalPurpose"
  throughput_mode  = "bursting"

  # Mount targets / security group
  mount_targets = {
    "ap-southeast-1a" = {
      subnet_id = module.vpc.private_subnets[0]
    }
    "ap-southeast-1b" = {
      subnet_id = module.vpc.private_subnets[1]
    }
    "ap-southeast-1c" = {
      subnet_id = module.vpc.private_subnets[2]
    }
  }

  deny_nonsecure_transport = false

  security_group_description = "EFS security group"
  security_group_vpc_id      = module.vpc.vpc_id
  security_group_rules       = {
    "private-subnet" = {
      cidr_blocks = module.vpc.private_subnets
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

Replace EFS-id with EFS resource ID (e.g. fs-1234567890abcdef) you just created in previous step and then run command:



kubectl apply -f solr-efs-pvc.yaml -n solr


Enter fullscreen mode Exit fullscreen mode

From now on EFS is ready to use on your cluster. In next step, you need to upgrade Solr to take EFS as backup storage. First, create values.yaml file as below:



backupRepositories:
  - name: "solr-backup"
    volume:
      source: # Required
        persistentVolumeClaim:
          claimName: "solr-efs-claim"
      directory: "solr-backup/" # Optional


Enter fullscreen mode Exit fullscreen mode

Note that you will need to do this for both origin and target cluster.

Second, you need to roll it out using helm, run command:



helm upgrade --install solr -f values.yaml apache-solr/solr -n solr --version <version> 


Enter fullscreen mode Exit fullscreen mode

Finally, you should see EFS claim name in your Solr pod using this command as an expected result:



kubeclt describe statefulset/dica-solrcloud -n solr | grep solr-efs-claim
    ClaimName:  solr-efs-claim


Enter fullscreen mode Exit fullscreen mode

Create backup from source cluster

In order to restore Solr to new cluster, you definitely need to have a backup file in your hand. You are going to backup a collection using Solr API by using the following command:



curl --user admin:<password> https://<origin-solr-endpoint>/solr/admin/collections?action=BACKUP&name=<backup-name>&collection=<collection-name>&location=file:///var/solr/data/backup-restore/solr-backup&repository=solr-backup


Enter fullscreen mode Exit fullscreen mode

If you have more than one collection, just repeat the process. Replace password with admin's password, origin-solr-endpoint for your origin Solr’s endpoint and as your choice.

You can check your backup progress by accessing to a pod and then check the directory:

Image description

Restore Solr to target EKS cluster

Since both origin and target Solr’s backup storage are using the same directory in AWS EFS as you setup in previous steps. You only need to invoke the restore API in your target cluster:



curl --user admin:<password> http://localhost:3000/solr/admin/collections?action=RESTORE&name=<backup-name>&location="/var/solr/data/backup-restore/solr-backup"&collection=<collection-name>


Enter fullscreen mode Exit fullscreen mode

As I configured to use port-fowarding, I only need to replace Solr’s endpoint with localhost:3000. Finally, let’s go to Solr UI and you should see the collection have been restored successfully to your new EKS cluster.

Restored collection

After that, you can start setting up autoscaling, ingress, security, and other resources for Solr in new EKS cluster and connecting your application to the database.

Should you need any further information regarding Solr’s backup and restore API, please visit this link.

Conclusion

Solr is widely used in enterprise and SMB. Depending on your system requirements and circumstances, migrating Solr to Amazon EKS will require different setup and approaches. I hope this post will provide you with useful information about Solr migration using backup and restore. Any comments are welcomed. Thank you for your reading!

Thanks co-author @coangha21 for your effort in this post!

💖 💪 🙅 🚩
haintkit
Hai Nguyen

Posted on August 31, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related