Hai Nguyen
Posted on August 31, 2023
Co-author: @coangha21
Solr is an open-source enterprise-search platform, written in Java. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration, NoSQL features and rich document (e.g., Word, PDF) handling. Providing distributed search and index replication, Solr is designed for scalability and fault tolerance. Solr is widely used for enterprise search and analytics use cases and has an active development community and regular releases.
Solr runs as a standalone full-text search server. It uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it usable from most popular programming languages. Solr's external configuration allows it to be tailored to many types of applications without Java coding, and it has a plugin architecture to support more advanced customization.
In this article, I'll walk through the process of migrate Solr from a Kubernetes cluster to Amazon EKS (Elastic Kubernetes Service) using backup and restore method. Please take into account, depend on your Solr's version, system requirement, circumstance, etc, you will need to extra setup on EFS to ensure your K8s cluster is able to access to network file system on AWS. This will not be required if your Solr’s version can use S3 as backup repository. Please refer to the links below:
- Solr Operator documentation
- Working with Amazon EFS access points - EFS
- Is it possible to make EFS publicly accessible?
For the purpose of demonstrating, I'll migrate Solr from an EKS cluster to another one within the same region. However, you can apply this migration method to any Kubernetes cluster running on any platform.
Prerequisite:
Before you begin, make sure you have the following available:
AWS account and required permission to create resources
Terraform or AWS CLI, kubectl and helm installed on your machine
The following step is what we will do in this article:
Step 1: Create target EKS cluster.
Step 2: Install Solr using Helm.
Step 3: Setup Solr backup storage.
Step 4: Create backup from origin cluster.
Step 5: Restore Solr to EKS using backup.
Let's go in details!
Create target EKS cluster
There are many ways to create a cluster such as using eksctl. In my case, I will use terraform module cause it’s easy to reuse and comprehend.
This is my Terraform code template to create the cluster. You can just copy and run it or customize based on your desired configurations:
provider "aws" {
region = "ap-southeast-1"
default_tags {
tags = {
environment = "Dev"
}
}
}
provider "kubernetes" {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
token = data.aws_eks_cluster_auth.this.token
}
provider "helm" {
kubernetes {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
token = data.aws_eks_cluster_auth.this.token
}
}
provider "kubectl" {
apply_retry_count = 10
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
load_config_file = false
token = data.aws_eks_cluster_auth.this.token
}
data "aws_eks_cluster_auth" "this" {
name = module.eks.cluster_name
}
data "aws_availability_zones" "available" {}
locals {
region = "ap-southeast-1"
vpc_cidr = "10.0.0.0/16"
azs = slice(data.aws_availability_zones.available.names, 0, 3)
}
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 19.12"
## EKS Cluster Config
cluster_name = "solr-demo"
cluster_version = "1.25"
## VPC Config
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
# EKS Cluster Network Config
cluster_endpoint_private_access = true
cluster_endpoint_public_access = true
## EKS Worker
eks_managed_node_groups = {
"solr-nodegroup" = {
node_group_name = "solr_managed_node_group"
# launch_template_os = "amazonlinux2eks"
public_ip = false
pre_userdata = <<-EOF
yum install -y amazon-ssm-agent
systemctl enable amazon-ssm-agent && systemctl start amazon-ssm-agent
EOF
desired_size = 2
ami_type = "AL2_x86_64"
capacity_type = "ON_DEMAND"
instance_types = ["t3.medium"]
disk_size = 30
}
}
}
module "eks_blueprints_addons_common" {
source = "aws-ia/eks-blueprints-addons/aws"
version = "~> 1.3.0"
cluster_name = module.eks.cluster_name
cluster_endpoint = module.eks.cluster_endpoint
cluster_version = module.eks.cluster_version
oidc_provider_arn = module.eks.oidc_provider_arn
create_delay_dependencies = [for ng in module.eks.eks_managed_node_groups: ng.node_group_arn]
eks_addons = {
aws-ebs-csi-driver = {
service_account_role_arn = module.ebs_csi_driver_irsa.iam_role_arn
}
vpc-cni = {
service_account_role_arn = module.aws_node_irsa.iam_role_arn
}
coredns = {
}
kube-proxy = {
}
}
enable_aws_efs_csi_driver = true
}
## Resource for VPC CNI Addon
module "aws_node_irsa" {
source = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
version = "~> 5.20"
role_name_prefix = "${module.eks.cluster_name}-aws-node-"
attach_vpc_cni_policy = true
vpc_cni_enable_ipv4 = true
oidc_providers = {
main = {
provider_arn = module.eks.oidc_provider_arn
namespace_service_accounts = ["kube-system:aws-node"]
}
}
}
module "ebs_csi_driver_irsa" {
source = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
version = "~> 5.20"
role_name_prefix = "${module.eks.cluster_name}-ebs-csi-driver-"
attach_ebs_csi_policy = true
oidc_providers = {
main = {
provider_arn = module.eks.oidc_provider_arn
namespace_service_accounts = ["kube-system:ebs-csi-controller-sa"]
}
}
}
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"
name = "solr-demo-subnet"
cidr = local.vpc_cidr
azs = local.azs
private_subnets = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 4, k)]
public_subnets = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 8, k + 48)]
enable_nat_gateway = true
single_nat_gateway = true
public_subnet_tags = {
"kubernetes.io/role/elb" = 1
}
private_subnet_tags = {
"kubernetes.io/role/internal-elb" = 1
}
}
The following AWS resources will be created:
VPC with private and public subnets.
EKS cluster along with a node group (t3.medium x 01) and EKS adds-on (aws-ebs-csi-driver, vpc-cni, coredns, kube-proxy).
You can check AWS resources in AWS management console:
Install Solr using helm
Next step is to install Solr using helm:
### Install the Solr & Zookeeper CRDs
helm repo add apache-solr https://solr.apache.org/charts
helm repo update
### Install the Solr operator and Zookeeper Operator
kubectl create -f https://solr.apache.org/operator/downloads/crds/<version>/all-with-dependencies.yaml
helm install solr-operator apache-solr/solr-operator --version <version>
### Install the Solr, zookeeper
helm install solr apache-solr/solr -n solr --version <version>
Replace version with your chart version or chart version which contain your Solr’s version.
Next run this command to get admin’s password and access to Solr UI
### Get solr password
kubectl get secret solrcloud-security-bootstrap -n solr -o jsonpath='{.data.admin}' | base64 --decode
### Port forward Solr UI
kubectl port-forward service/solrcloud-common 3000:80 -n solr
Now open your browser and type http://localhost:3000, the result should be:
Then using the admin’s password we get above to login.
Setup Solr backup storage
After you have Solr installation done, it is time to setup Solr backup storage. At the time of writing this post, AWS supports 2 backup storage types: EFS and S3. Depending on your Solr’s version and system requirement, you can choose either of them. In this demo, I’ll use EFS as backup storage since this storage type is compatible to most Solr’s version. For more information, please visit this link.
To setup EFS as Solr’s backup storage, you need to create an EFS in AWS. This terraform code template will create EFS resource:
module "efs" {
source = "terraform-aws-modules/efs/aws"
version = "1.2.0"
# File system
name = "solr-backup-storage"
performance_mode = "generalPurpose"
throughput_mode = "bursting"
# Mount targets / security group
mount_targets = {
"ap-southeast-1a" = {
subnet_id = module.vpc.private_subnets[0]
}
"ap-southeast-1b" = {
subnet_id = module.vpc.private_subnets[1]
}
"ap-southeast-1c" = {
subnet_id = module.vpc.private_subnets[2]
}
}
deny_nonsecure_transport = false
security_group_description = "EFS security group"
security_group_vpc_id = module.vpc.vpc_id
security_group_rules = {
"private-subnet" = {
cidr_blocks = module.vpc.private_subnets
}
}
}
Replace EFS-id with EFS resource ID (e.g. fs-1234567890abcdef) you just created in previous step and then run command:
kubectl apply -f solr-efs-pvc.yaml -n solr
From now on EFS is ready to use on your cluster. In next step, you need to upgrade Solr to take EFS as backup storage. First, create values.yaml file as below:
backupRepositories:
- name: "solr-backup"
volume:
source: # Required
persistentVolumeClaim:
claimName: "solr-efs-claim"
directory: "solr-backup/" # Optional
Note that you will need to do this for both origin and target cluster.
Second, you need to roll it out using helm, run command:
helm upgrade --install solr -f values.yaml apache-solr/solr -n solr --version <version>
Finally, you should see EFS claim name in your Solr pod using this command as an expected result:
kubeclt describe statefulset/dica-solrcloud -n solr | grep solr-efs-claim
ClaimName: solr-efs-claim
Create backup from source cluster
In order to restore Solr to new cluster, you definitely need to have a backup file in your hand. You are going to backup a collection using Solr API by using the following command:
curl --user admin:<password> https://<origin-solr-endpoint>/solr/admin/collections?action=BACKUP&name=<backup-name>&collection=<collection-name>&location=file:///var/solr/data/backup-restore/solr-backup&repository=solr-backup
If you have more than one collection, just repeat the process. Replace password with admin's password, origin-solr-endpoint for your origin Solr’s endpoint and as your choice.
You can check your backup progress by accessing to a pod and then check the directory:
Restore Solr to target EKS cluster
Since both origin and target Solr’s backup storage are using the same directory in AWS EFS as you setup in previous steps. You only need to invoke the restore API in your target cluster:
curl --user admin:<password> http://localhost:3000/solr/admin/collections?action=RESTORE&name=<backup-name>&location="/var/solr/data/backup-restore/solr-backup"&collection=<collection-name>
As I configured to use port-fowarding, I only need to replace Solr’s endpoint with localhost:3000. Finally, let’s go to Solr UI and you should see the collection have been restored successfully to your new EKS cluster.
After that, you can start setting up autoscaling, ingress, security, and other resources for Solr in new EKS cluster and connecting your application to the database.
Should you need any further information regarding Solr’s backup and restore API, please visit this link.
Conclusion
Solr is widely used in enterprise and SMB. Depending on your system requirements and circumstances, migrating Solr to Amazon EKS will require different setup and approaches. I hope this post will provide you with useful information about Solr migration using backup and restore. Any comments are welcomed. Thank you for your reading!
Thanks co-author @coangha21 for your effort in this post!
Posted on August 31, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.