Distributed Key-Value Store in C/C++, from Scratch!
David Tandetnik
Posted on May 29, 2020
Background
In the Spring of 2020, I co-developed a distributed key-value store in C as part of Northeastern University's Software Development course. The course's capstone project - dubbed eau2
- consisted of creating a series of applications that all used an underlying distributed data storage solution built by us, the students.
The idea of the project was to expose students to designing, building, testing, and iterating on a large codebase. The result was an interesting and complex distributed app that I think did a good job of exposing students to what it's like to maintain real codebases as software engineers.
If you're a student (or even a professional developer), I'd highly recommend trying to build a large program like this layer-by-layer or at least enrolling in a course that offers something similar. The end result is not only impressive but also highly educational! 😃
Design
Language Choice
Eau2
is built using a restricted version of C++ that we called "C with classes". Essentially the code looks like C but uses C++ classes to allow for better code organization and object-oriented design. Most other features and libraries of C++ were forbidden.
Beware, this is not an app designed for production use and so these choices were not made with performance in mind. Rather, the idea was to force students to write their own utility classes (i.e. Object, String, Array) and therefore truly understand every part of their code. Staying away from most of C++ also conveniently avoided any of its inevitable rabbit holes...
Architecture
At a high level, the eau2
system aims to provide key-value storage over a distributed set of nodes in order to support applications which process large amounts of data. Given a cluster of compute nodes, users can run the eau2
system to store data across the cluster but interact with it as if it were local. Note however that the system does not have any of the consistency or availability guarantees normally featured in distributed systems.
In order to achieve this abstraction, the system is broken up into three layers:
- At the bottom is the key-value storage system. Since ultimately the data is stored across multiple nodes, this layer features a network protocol which allows the key-value stores on each node to interact with each other. Communication between the stores allows the user to view the key-value storage as a single system holding all of the data, without any knowledge of the underlying distribution.
- The layer above provides abstractions like
DistributedColumns
andDistributedDataframes
. These utility classes expose an API that allows for easy data aggregation and manipulation across the network. - At the top sits the
Application
layer. This layer provides all the functionality necessary to run theeau2
system and interacts with the lower levels. The user builds in this layer to leverage the underlying key-value store as the backbone of their program.
Implementation
The bottom layer is implemented using one main class, the Store
. This class has two main components, a Map
and a Network
.
-
Map
is the normal key-value storage structure, mappingKeys
toStrings
.Keys
represent a tuple of aString
, the key name, and anInteger
, the ID of the node theKey
lives on. Values in theMap
are stored asStrings
of serialized blobs of data. -
Network
on the other hand contains all the necessary logic for registering and communicating with the other nodes in the cluster. The nodes passMessages
between each other to bothPUT
andGET
data. Overall, theStore
layer has a simple public API supporting methods such asget
,getAndWait
, andput
.
In the middle layer we have the useful abstraction classes. Specifically, the DistributedDataFrame
class and the DistributedColumn
class.
-
DistributedDataFrame
is a queryable table of data which supports aggregate operations such asmap
andfilter
. All data stored in aDistributedDataFrame
is stored in columnar format with all data in each column being of a single type. The types supported areINT
,BOOL
,FLOAT
, andSTRING
. Unlike normal DataFrames,DistributedDataFrames
access their data as needed from other nodes across the network under-the-hood. -
DistributedColumn
has the API of a normal column in a Dataframe but is implemented such that the values it stores are distributed in chunks across the cluster of nodes. This abstraction allows this class to represent much more data than the average column stored in local memory.
For the top layer, we provide a single Application
class. This class has the Store
as a class field, providing the user who extends the Application
class with access to all data stored on the system. Additionally, this class contains useful utility methods, such as this_node()
which returns the index of the current node.
Use Case
Here's an example program that runs on 2 nodes. Node 0 saves some data to the store. Node 1 waits until the data appears in the store, gets it, calculates the max, and stores the max back in the Store
.
class Demo : public Application {
Public:
Key data_key(“data”, 0);
Key max_key(“max”, 1);
Demo(size_t idx): Application(idx) {}
void run_() {
switch(this_node()) {
case 0: producer(); break;
case 1: maxer();
}
}
void producer() {
float* vals = new float[3];
vals[0] = 1;
vals[1] = 10;
vals[2] = 100;
// Stores 'vals' as an Array in 'store' under the key 'data_key'
DataFrame::fromArray(&data_key, store, 3, vals);
}
void maxer() {
// Gets the key 'data_key' from the store, waiting until it exists
// if it's not there
DataFrame* frame = store->getAndWait(&data_key);
float max = frame->get_float(0, 0);
for (size_t i = 1; i < frame->nrows(); i++) {
float f = frame->get_float(0, i);
if (f > max) max = f;
}
// Stores 'max' as a Scalar in 'store' under the key 'max_key'
DataFrame::fromScalar(&max_key, store, max);
}
};
Results
By the end of the project, we were able to read in and query datafiles up to a few gigabytes in size at reasonable speeds. For reference, it could take up to 5 minutes to read a 1 GB file into the system, depending on the hardware of the machine the node reading in the data was running on. Once loaded into the cluster, querying the data took about as long as a heavy HTTP request, usually < 500ms.
These speeds are at first not too impressive, but because we rolled our own data adapters, variable wrappers, and network protocol (i.e. no HTTP!), there is an inherent overhead to eau2
that would not be present under proper C++ usage. We therefore are proud of the system's capabilities, even if they're not production-grade.
You can check out the source for our final demo, "Degrees of Linus", here. The demo calculates the number of collaborators that worked on Github projects a certain number of degrees away from Linus Torvalds.
Closing Thoughts
Co-developing eau2
was a great learning experience that exposed me to everything from manual memory management and network socket operations to data serialization and API design. Like I mentioned earlier, I hope this project inspires you to try to build something similar; the idea isn't to build something perfect (ours certainly isn't) but rather to push yourself to design and iterate on a codebase that is larger and more complex than anything you've ever worked on for classes or hobby projects.
Good luck!
Posted on May 29, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.