Scalable Key-Value Store

utkarshvit

Utkarsh Sharma

Posted on May 20, 2020

Scalable Key-Value Store

Scalable Key-Value Store

System Overview

This project implements a scalable multi node key value store using Dynamo Db's consistent hashing policy. The client adds a key value pair and retrieves a value for a key from the system using simple HTTP GET and POST as we will see in the next sections. Similarly a client can add a node to the system and remove a node form the system without having to worry about key migration and updating routing information.

Running in Docker Environment

The next subsections are for running the system on a single host docker environment. For knowing how to run the system on VCL or other cloud services please refer to the Running on Cloud Environment section.

Architechture

Each node in the system is a Ubuntu 18 image running a Flask application on the Python development server and storing the key value in the node itself using Simple Cache. The nodes communicate with each other via HTTP GET and POST. For hashing we have used the md5 hash to calculate the hash of the key and apply modulo 10000 on it so that the max value in the hash ring in 10000.

Docker Compose Setup

Our present experimental system is based on the following setup. You can modify the system based on your preferences by editing the docker-compose.yml file. We have two nodes in the default state. These two nodes are Ubuntu containers running a Flask app communicating with each other via HTTP in a docker network. There is a load balancer which uses round robin algorithm to distribute load between these two nodes. We also have a standby container running the sample Flask app which we will use later for scaling up. The fourth container is a client which will interact with the system and run the test cases.

        (3000)                               
      ____n1___   ←--┐      n3 (standby)       |    n1     = 172.23.0.3
     |        |       |                         |    n2     = 172.23.0.4
     |        |      LB  ←--- client           |    n3     =  172.23.0.5
     |___n2___|   (load balancer)               |    LB     = 172.23.0.6
       (8000)                                   |  client    = 172.23.0.7 
Deployment
  1. Running the containers

Run the following command to deploy the system and attach the host session to the logs of the spawned containers

$docker-compose up

  1. Attach to the console of the client using the command below so as to access the docker network.

$docker container exec -it advdistsystems_client_1 /bin/bash

  1. Reconfigure the system

Run the following command to update the routing information for all the nodes in the system.

$sh reconfigure.sh

Running on Cloud Environment

The next subsections are for running the system on a cloud environment. This code has been deployed and tested by running on VCL and AWS.

Architechture

Each node is an EC2 instance of type t2.micro running the Ubuntu 18 image. The node is running the Apache web server. The application is built on Flask development Framework and is running in the node on the Python’s WSGI app server and listening on port 80. The node is also running a Memcached server listening on port 11211. The Memcached server is responsible for storing the key value pairs and also for maintaining the state of the hash ring. We are using the Memcached server for storing the state of the hash ring as apache can spawn multiple threads of the application and the state of the hash ring should be consistent across all the threads. The apache web server is running 5 instances of the Flask application. The nodes communicate with each other via HTTP and with the Memcached server via TCP. All nodes are behind a load balancer which is an Ubuntu VM running HAProxy and uses round robin to route requests between servers. For hashing we have used the md5 hash to calculate the hash of the key and apply modulo 10000 on it so that the max value in the hash ring in 10000.

Deployment
  1. Create a VM for each node you want in the system running the Ubuntu 18 image on VCL or any other cloud provider.

  2. Attach to the terminal of each node and download the setup.sh file and run it using the command below. Repeat this process for all the nodes deployed in the previous step. This file contains the commands to install all the components of the system.

$sudo wget https://raw.githubusercontent.com/UtkarshVIT/AdvDistSystems/master/setup.sh && sh setup.sh

  1. Edit the reconfigure.sh file to confiure the hash table on each node. It is currently configured to deploy a two node system but can be configured to a custom scenario. The reconfiguration details are in the file itself. Complete system reconfiguration by running the command below.

$sh reconfigure.sh

  1. Create a Layer 7 load balancer using HAProxy as explained here for VCL or custom create a load balancer using AWS load balancer service.

Running Test Cases

IMP Note

For docker deployment, the test cases are configured for the degault setup in docker-compose. For cloud deployment,the test cases in ./tests/pytest.py are configured for a 2 node system with a scale up test to expand it to three nodes.

Running test case steps
  1. If running the docker setup, attach to the console of the client using the command below. If running on a cloud service, skip this step as ports are public.

$docker container exec -it advdistsystems_client_1 /bin/bash

  1. Reconfigure the system

Reconfigure the system to clear cache and update routing information. The reconfigure.sh file in the root directory is updated with the information of the experimental setup. Update the file iff you are using a custom setup. Run the following command from the client's shell

$sh reconfigure.sh

  1. Reconfiure test Cases.

If running on cloud scenario, the IP addresses of the nodes are pre set for the docker setup. Thus, edit the ip addresses of the nodes in the file /tests/pytest.py if running on a cloud deployment else if using docker you can skip this step.

  1. Execute test cases.

The following command will simulate four scenarios and exectute the test cases in tests/pytest.py. For detailed information on the test cases see pytest.py

$python pytest.py

Common Commands

IMP Note

Please Ensure you have completed Step 1 and Step 2 of Running Test Cases before running any of the below commands.

Commands
  • Set a env variable for the load balancer.



For example, in the docker setup use, this command to setup env variable for load balancer.



```$lb="172.23.0.6:5000"```



* Adding a key-val pair [sending POST to LB]



```$curl --data "key=<custom-key>&val=<custom-val>" $lb/route```



* Fetching val for a key [sending GET to LB]



```$curl $lb/route?key=<custom-key>```



* Adding a node to the system



```$curl --data $lb/add_node/<key-of-new-node>/<ip:port-of-new-node>```



For example, in the docker setup use, this command to add a node at key 5000 in the system.



```$curl --data $lb/add_node/5000/172.23.0.5:5000```



* Removing a node from the system



```$curl $lb/remove_node/<ip:port-of-target-node>```

💖 💪 🙅 🚩
utkarshvit
Utkarsh Sharma

Posted on May 20, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Scalable Key-Value Store
showdev Scalable Key-Value Store

May 20, 2020