Setting up SolrCloud for Production
Loknath Dhar
Posted on July 28, 2022
I wasn't familiar with SolrCloud when I started working with this. When I started to learn it and worked directly with it - I did so many mistakes. Even setting up SolrCloud for production was a hassle for me (I'm a slow learner). But overtime, I got a hang of it and I think writing an up-to-date guide to set up SolrCloud on production server is a good idea.
The official documentation is amazing enough and you should read it. This post is very straight forward. If you want any explanation and want to know/learn more, please consult official documentation.
Let's get started.
(I'm writing this guide with the experience of Linux servers)
To easily go to the installation directory, I put these on .bashrc
:
export solr_home=/opt/solr
PATH=/opt/solr/bin:$PATH
export zookeeper_home=/opt/zookeeper
PATH=/opt/zookeeper/bin:$PATH
$ cd $zookeeper_home # will take to /opt/zookeeper
$ cd $solr_home # will take to /opt/solr
Note that, this
solr_home
andSOLR_HOME
variable isn't same.
Zookeeper
To bring the SolrCloud into production, we need external zookeeper (not the embedded one) server to manage our configuration and coordination centrally. We're going to work with 3 servers. We will install zookeeper and Apache Solr into all servers with same configuration.
First, let's configure our Apache Zookeeper. Install desired Java version according to the official documentation. At the time of writing, the latest Zookeeper and Solr - both needs Java 11.
$ sudo apt install openjdk-11-jdk
# check java version
$ java -version
Download the latest stable version from the official website. Notice that we don't want the source (src) bundle, we need the binary (bin) version.
Installation and configuration
It's not recommended to work with them while on root. But I'm going to work as a root user. My installation directory will be under /opt
.
$ tar -xvf zookeeper-*.tar.gz -C /opt
$ ln -s /opt/zookeeper-* /opt/zookeeper
We need to create a configuration file for zookeeper under $zookeepr_home/conf/
. The file name will be zoo.cfg
:
tickTime=2000
dataDir=/var/lib/zookeeper/data
dataLogDir=/var/lib/zookeeper/logs
clientPort=2181
4lw.commands.whitelist=mntr,conf,ruok
initLimit=5
syncLimit=2
# put IP addresses or host on the place of Server1, Server2, Server3
server.1=Server1:2888:3888
server.2=Server2:2888:3888
server.3=Server3:2888:3888
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
We will create a zookeeper environment file in the same place of zoo.cfg
, which is under $zookeeper_home/conf
. The file name will be zookeeper-env.sh
:
JAVA_HOME="/usr/lib/jvm/java-1.11.0-openjdk-amd64"
ZOO_LOG_DIR="/var/lib/zookeeper/logs"
ZOO_LOG4J_PROP="INFO,ROLLINGFILE"
# increaseing the file size limit to 50MiB
JVMFLAGS="$JVMFLAGS -Djute.maxbuffer=50000000"
Create directories defined on the configuration:
mkdir -p /var/lib/zookeeper/data
mkdir -p /var/lib/zookeeper/logs
Create a myid
text file under /var/lib/zookeeper/data
directory. Put the server id in that file with a single line. In case of server 2, the file will contain: 2
echo "2" >/var/lib/zookeeper/data/myid
Now you can start zookeeper whenever you want but it need to be started before Solr.
$ cd $zookeeper_home
$ bin/zkServer.sh start
Apache SolrCloud
Download latest Solr (bin version) on the server, move file to the /opt
and extract it.
$ tar xzf solr-*.tgz solr-*/bin/install_solr_service.sh --strip-components=2
Install it under /opt
.
$ sudo bash ./install_solr_service.sh solr-*.tgz
$ ln -s solr-*/ solr
Edit bin/solr.in.sh
for some configuration. We can also set this system wide by creating a file under /etc/default
.
#writing include file
SOLR_PID_DIR=/var/solr
SOLR_HOME=/var/solr/data
#LOG SETTINGS
LOG4J_PROPS=/var/solr/log4j2.xml
SOLR_LOGS_DIR=/var/solr/logs
SOLR_HEAP="1g"
SOLR_JAVA_HOME="/usr/lib/jvm/java-1.11.0-openjdk-amd64"
ZK_HOST="zk-server1:2181,zk-server2:2181,zk-server3:2181"
SOLR_LOG_LEVEL=INFO
# Data backup location for replication environment
SOLR_OPTS="$SOLR_OPTS -Dsolr.allowPaths=/mnt/solr_backup"
# for soft commits
SOLR_OPTS="$SOLR_OPTS -Dsolr.autoSoftCommit.maxTime=10000"
SOLR_HOST="zk-server-ip" # current server IP address
SOLR_JAVA_MEM="-Xms2g -Xmx2g"
ZK_CLIENT_TIMEOUT="30000"
SOLR_PORT=8983
# To make available on the public internet
SOLR_JETTY_HOST="0.0.0.0"
# set this up in case if you set up authentication.
# By setting this, the script will run without error
SOLR_AUTH_TYPE="basic"
SOLR_AUTHENTICATION_OPTS="-Dbasicauth=username:password"
Create directories defined in the configuration:
$ mkdir -p /var/solr/data
$ mkdir -p /var/solr/logs
And it's done.
You have to configure all servers like this.
Start SolrCloud by using the script:
$ bin/solr start -c -p 8983 -s /var/solr/data -z zk1:2181,zk2:2181,zk3:2181 -force
To get help:
bin/solr start -help
bin/solr restart -help # you got the point how to get help
I hope you are able to get the SolrCloud running without any errors.
Use bin/solr
script to interact with Solr and Zookeeper. To know more, use official documentation.
To create a collection:
$ bin/solr create_collection -c col_name -d _default -shards 1 -replicationFactor 3 -p 8983 -V -force
To delete a collection:
$ bin/solr delete -c col_name -deleteConfig true -p 8983 -V
That's all. I will try to update this post from time to time. I'm also planning to include some useful commands later.
Hope, things are working.
Happy searching!
Posted on July 28, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.