Distributed storage in 30 minutes
tarantool
Posted on February 14, 2022
Author: Igor Zolotarev
Hello, my name is Igor, and I am a part of the Tarantool DB team. When developing, I often need rapid prototypes of database applications, for example, to test code or to create an MVP. Of course, I would like such a prototype to require minimal effort to refine, in case it is decided to use it in production.
I don't like wasting my time configuring an SQL database, thinking about how to manage data sharding, or spending even more time studying connector interfaces. I prefer just to write a few lines of code, run it, and have everything working out of the box. To develop distributed applications rapidly, I use Cartridge, a framework for managing cluster applications based on Tarantool, a NoSQL database.
Today I will show how to quickly write a Cartridge-based application, cover it with tests, and run it. The article will be of interest to anyone tired of spending a lot of time prototyping applications, as well as those who want to try a new NoSQL technology.
Contents
From this article, you will learn what Cartridge is and what principles to have in mind when writing cluster business logic in it.
We will write a cluster application for storing data about employees of a company. The steps to accomplish it are:
- Creating an application from a template with cartridge-cli
- Describing your business logic in Lua in terms of Cartridge cluster roles
- Data Storage
- Custom HTTP API
- Writing tests
- Launching and configuring a small cluster locally
- Downloading configuration
- Configuring failover
Cartridge framework
Cartridge is a framework for developing cluster applications. It manages several instances of the Tarantool NoSQL database and shards data using the vshard
module. Tarantool is a persistent in-memory database. It is very fast due to storing data in RAM but also reliable since Tarantool dumps all data to the hard disk and allows you to set up replication. Cartridge takes care of configuring Tarantool nodes and sharding cluster nodes, which leaves a developer with only writing the business logic of the applications and configuring the failover.
Advantages of Cartridge
- Sharding and replication out of the box
- Built-in failover support
- CRUD, a NoSQL cluster query language
- Integration testing of the entire cluster
- Ansible-based cluster management
- Cluster administration utility
- Monitoring tools
Creating the first application
To do this, we will need cartridge-cli. It is a utility for working with Cartridge applications. It allows you to create an application from a template, manage a locally running cluster, and connect to Tarantool instances.
Installing Tarantool and cartridge-cli
On Debian or Ubuntu:
curl -L https://tarantool.io/fJPRtan/release/2.8/installer.sh | bash
sudo apt install cartridge-cli
On CentOS, Fedora, or ALT Linux:
curl -L https://tarantool.io/fJPRtan/release/2.8/installer.sh | bash
sudo yum install cartridge-cli
On macOS:
brew install tarantool
brew install cartridge-cli
Let's create a template application named myapp:
cartridge create --name myapp
cd myapp
tree .
Now we get a project structure similar to this:
myapp
├── app
│ └── roles
│ └── custom.lua
├── test
├── init.lua
├── myapp-scm-1.rockspec
- the
init.lua
file is the entry point of the Cartridge application. It defines the cluster's configuration and calls the functions required at the start of each application node. - the
app/roles/
directory contains "roles" describing the application's business logic. - the
myapp-scm-1.rockspec
file specifies the application's dependencies
By now, we already have a working "Hello, world!" application. It can be started with the following commands
cartridge build
cartridge start -d
cartridge replicasets setup --bootstrap-vshard
After that, access to localhost:8081/hello
will show "Hello world!".
Let's now create a small template-based application, a sharded storage with an HTTP API for storing and receiving data. To do this, we need to understand how to implement cluster business logic in Cartridge.
Writing business logic in Cartridge
Each cluster application is based on roles, Lua modules describing the application's business logic. For example, they can be modules that store data, provide HTTP API, or cache data from an Oracle database. A role is assigned to a set of instances joined by replication (a replica set), and it is enabled at each instance. Replica sets can have different set of roles.
In Cartridge, on each cluster node, there is a cluster configuration. It describes the cluster's topology and, optionally, the configuration that your role would use. Such configuration can be changed in runtime to affect the role's behavior.
Each role has a structure similar to this:
return {
role_name = 'your_role_name',
init = init,
validate_config = validate_config,
apply_config = apply_config,
stop = stop,
rpc_function = rpc_function,
dependencies = {
'another_role_name',
},
}
Role lifecycle
- An instance is starting.
- The role named
role_name
waits for the start of all its dependent roles specified independencies
. - The
validate_config
function is called to check whether the role's configuration is valid. - The role initialization function
init
is called. This function performs the actions that need to be done once, when the role is started for the first time. - The
apply_config
function is called to apply the configuration (if it is specified). Thevalidate_config
andapply_config
functions are also called whenever the role's configuration changes. - The role is saved in the registry. From there it will be available to other roles on the same node via
cartridge.service_get('your_role_name')
. - The functions declared in a role will be available from other nodes via
cartridge.rpc_call('your_role_name', 'rpc_function')
. - Before a role is stopped or restarted, the
stop
function is launched. It terminates the role, for example, removing the fibers created by the role.
Cluster NoSQL queries
There are several ways to write cluster queries in Cartridge:
- Calling functions via the vshard API (this is a complicated but flexible way):
vshard.router.callrw(bucket_id, 'app.roles.myrole.my_rpc_func', {...})
-
- Simple function calls:
crud.insert
/get
/replace
/ ... - Support for calculating
bucket_id
is limited - Roles must depend on
crud-router
/crud-storage
- Simple function calls:
Application structure
Suppose we want a cluster with one router and two groups of storages with two instances each. This topology is typical for both Redis Cluster and MongoDB Cluster. For the stateful failover to save the state of the current masters, the cluster will include stateboard, yet another instance. When increased reliability is required, it is better to use an etcd cluster instead of stateboard.
The router will distribute requests across the cluster and manage the failover.
Writing custom roles
We will need to write two roles: one for data storage, and one for HTTP API.
In the app/roles directory, we create two new files: app/roles/storage.lua and app/roles/api.lua
Data Storage
Let's describe the role for data storage. In the init
function, we will create a table and indexes for it, then add crud-storage
to its dependencies.
The Lua code in the init function is equivalent to the following pseudo-SQL code:
CREATE TABLE employee(
bucket_id unsigned,
employee_id string,
name string,
department string,
position string,
salary unsigned
);
CREATE UNIQUE INDEX primary ON employee(employee_id);
CREATE INDEX bucket_id ON employee(bucket_id);
Add the following code to the app/roles/storage.lua file:
local function init(opts)
-- opts has the attribute indicating if the function is called at the master or at the replica
-- we create tables only at the master instance, they will appear automatically at the replica
if opts.is_master then
-- Creating a table with employees
local employee = box.schema.space.create('employee', {if_not_exists = true})
-- setting the format
employee:format({
{name = 'bucket_id', type = 'unsigned'},
{name = 'employee_id', type = 'string', comment = 'ID сотрудника'},
{name = 'name', type = 'string', comment = 'Р¤РРћ сотрудника'},
{name = 'department', type = 'string', comment = 'Отдел'},
{name = 'position', type = 'string', comment = 'Должность'},
{name = 'salary', type = 'unsigned', comment = 'Зарплата'}
})
-- Create the primary index
employee:create_index('primary', {parts = {{field = 'employee_id'}},
if_not_exists = true })
-- Indexing by bucket_id, it is necessary for sharding
employee:create_index('bucket_id', {parts = {{field = 'bucket_id'}},
unique = false,
if_not_exists = true })
end
return true
end
return {
init = init,
-- <<< remembering the crud-storage dependency
dependencies = {'cartridge.roles.crud-storage'},
}
We will not need the rest of the functions from the role's API, since our role has no configuration and it does not allocate resources to be cleaned after the role's work is complete.
HTTP API
We will need the second role to fill the tables with data and retrieve this data on request. The role will access the Cartridge's built-in HTTP server. It depends on crud-router
.
Let's define a function to handle POST requests. The request body will contain the object to be saved to the database.
local function post_employee(request)
-- getting an object from the request body
local employee = request:json()
-- writing it to the database
local _, err = crud.insert_object('employee', employee)
-- if an error occurs, writing it to the log and returning 500
if err ~= nil then
log.error(err)
return {status = 500}
end
return {status = 200}
end
The GET method will take the employees' salary values as a parameter. The expected response is a JSON with a list of employees whose salary is higher than the one specified in the request.
SELECT employee_id, name, department, position, salary
FROM employee
WHERE salary >= @salary
local function get_employees_by_salary(request)
-- get the salary parameter from the query
local salary = tonumber(request:query_param('salary') or 0)
-- selecting the employee data
local employees, err = crud.select('employee', {{'>=', 'salary', salary}})
-- if an error occurs, writing it to the log and returning 500
if err ~= nil then
log.error(err)
return { status = 500 }
end
-- the employees table stores the list of rows that meet the condition and the space format
-- the unflatten_rows function converts a table row to a key-value table
employees = crud.unflatten_rows(employees.rows, employees.metadata)
employees = fun.iter(employees):map(function(x)
return {
employee_id = x.employee_id,
name = x.name,
department = x.department,
position = x.position,
salary = x.salary,
}
end):totable()
return request:render({json = employees})
end
Now let's write the init
function for the role. Here we will turn to the Cartridge's registry to get an HTTP server and use it to assign the HTTP endpoints for the application.
local function init()
-- getting an HTTP-server from the Cartridge's registry
local httpd = assert(cartridge.service_get('httpd'), "Failed to get httpd serivce")
-- setting the routes
httpd:route({method = 'GET', path = '/employees'}, get_employees_by_salary)
httpd:route({method = 'POST', path = '/employee'}, post_employee)
return true
end
Putting everything together:
app/roles/api.lua
local cartridge = require('cartridge')
local crud = require('crud')
local log = require('log')
local fun = require('fun')
-- the 'GET /employees' method returns a list of employees with salaries greater than the one specified in the request
local function get_employees_by_salary(request)
-- getting the salary parameter from the query
local salary = tonumber(request:query_param('salary') or 0)
-- selecting the employee data
local employees, err = crud.select('employee', {{'>=', 'salary', salary}})
-- if an error occurs, writing it to the log and returning 500
if err ~= nil then
log.error(err)
return { status = 500 }
end
-- the employees table stores the list of rows that meet the condition and the space format
-- the unflatten_rows function converts a table row to a key-value table
employees = crud.unflatten_rows(employees.rows, employees.metadata)
employees = fun.iter(employees):map(function(x)
return {
employee_id = x.employee_id,
name = x.name,
department = x.department,
position = x.position,
salary = x.salary,
}
end):totable()
return request:render({json = employees})
end
local function post_employee(request)
-- getting an object from the request body
local employee = request:json()
-- writing it to the database
local _, err = crud.insert_object('employee', employee)
-- if an error occurs, writing it to the log and returning 500
if err ~= nil then
log.error(err)
return {status = 500}
end
return {status = 200}
end
local function init()
-- getting an HTTP-server from the Cartridge's registry
local httpd = assert(cartridge.service_get('httpd'), "Failed to get httpd service")
-- setting the routes
httpd:route({method = 'GET', path = '/employees'}, get_employees_by_salary)
httpd:route({method = 'POST', path = '/employee'}, post_employee)
return true
end
return {
init = init,
-- addind the crud-storage dependency
dependencies = {'cartridge.roles.crud-router'},
}
init.lua
Let's describe the init.lua file. It is the entry point of a Cartridge application. To configure a cluster instance, the function cartridge.cfg() should be called in the init file of the cartridge.
cartridge.cfg(<opts>, <box_opts>)
-
<opts>
, the default cluster parameters- the list of available roles (all roles must be specified, even the permanent ones, to have them appear in the cluster)
- sharding parameters
- WebUI configuration
- etc
-
<box_opts>
, the Tarantool default parameters (passed to the instance's box.cfg{})
#!/usr/bin/env tarantool
require('strict').on()
-- specifying the path to search for modules
if package.setsearchroot ~= nil then
package.setsearchroot()
end
-- configuring Cartridge
local cartridge = require('cartridge')
local ok, err = cartridge.cfg({
roles = {
'cartridge.roles.vshard-storage',
'cartridge.roles.vshard-router',
'cartridge.roles.metrics',
-- <<< Adding crud roles
'cartridge.roles.crud-storage',
'cartridge.roles.crud-router',
-- <<< Adding custom roles
'app.roles.storage',
'app.roles.api',
},
cluster_cookie = 'myapp-cluster-cookie',
})
assert(ok, tostring(err))
The final step is to describe the dependencies of the application in the myapp-scm-1.rockspec file.
package = 'myapp'
version = 'scm-1'
source = {
url = '/dev/null',
}
-- Adding the dependencies
dependencies = {
'tarantool',
'lua >= 5.1',
'checks == 3.1.0-1',
'cartridge == 2.7.3-1',
'metrics == 0.11.0-1',
'crud == 0.8.0-1',
}
build = {
type = 'none';
}
The application's code is ready to run, but let's write some tests to make sure it works as expected.
Writing tests
Every application needs testing. The usual luatest is sufficient for unit tests. But to write an integration test, you may want to use the cartridge.test-helpers module. It is shipped with Cartridge and can be used to run a cluster of any structure for the tests.
local cartridge_helpers = require('cartridge.test-helpers')
-- creating a test cluster
local cluster = cartridge_helpers.Cluster:new({
server_command = './init.lua', -- test application entrypoint
datadir = './tmp', -- directory for xlog, snap, and other files
use_vshard = true, -- enable cluster sharding
-- list of replica sets:
replicasets = {
{
alias = 'api',
uuid = cartridge_helpers.uuid('a'),
roles = {'app.roles.custom'}, -- list of roles assigned to the replicaset
-- list of instances in the replicaset:
servers = {
{ instance_uuid = cartridge_helpers.uuid('a', 1), alias = 'api' },
...
},
},
...
}
})
Let's write an auxiliary module to use in the integration tests. In this module, a test cluster with two replica sets is created. Each replica set contains one instance:
The auxiliary module code:
test/helper.lua
local fio = require('fio')
local t = require('luatest')
local cartridge_helpers = require('cartridge.test-helpers')
local helper = {}
helper.root = fio.dirname(fio.abspath(package.search('init')))
helper.datadir = fio.pathjoin(helper.root, 'tmp', 'db_test')
helper.server_command = fio.pathjoin(helper.root, 'init.lua')
helper.cluster = cartridge_helpers.Cluster:new({
server_command = helper.server_command,
datadir = helper.datadir,
use_vshard = true,
replicasets = {
{
alias = 'api',
uuid = cartridge_helpers.uuid('a'),
roles = {'app.roles.api'},
servers = {
{ instance_uuid = cartridge_helpers.uuid('a', 1), alias = 'api' },
},
},
{
alias = 'storage',
uuid = cartridge_helpers.uuid('b'),
roles = {'app.roles.storage'},
servers = {
{ instance_uuid = cartridge_helpers.uuid('b', 1), alias = 'storage' },
},
},
}
})
function helper.truncate_space_on_cluster(cluster, space_name)
assert(cluster ~= nil)
for _, server in ipairs(cluster.servers) do
server.net_box:eval([[
local space_name = ...
local space = box.space[space_name]
if space ~= nil and not box.cfg.read_only then
space:truncate()
end
]], {space_name})
end
end
function helper.stop_cluster(cluster)
assert(cluster ~= nil)
cluster:stop()
fio.rmtree(cluster.datadir)
end
t.before_suite(function()
fio.rmtree(helper.datadir)
fio.mktree(helper.datadir)
box.cfg({work_dir = helper.datadir})
end)
return helper
The integration test code:
test/integration/api_test.lua
local t = require('luatest')
local g = t.group('integration_api')
local helper = require('test.helper')
local cluster = helper.cluster
g.before_all = function()
g.cluster = helper.cluster
g.cluster:start()
end
g.after_all = function()
helper.stop_cluster(g.cluster)
end
g.before_each = function()
helper.truncate_space_on_cluster(g.cluster, 'employee')
end
g.test_get_employee = function()
local server = cluster.main_server
-- filling the storage with data via HTTP API:
local response = server:http_request('post', '/employee',
{json = {name = 'John Doe', department = 'Delivery', position = 'Developer',
salary = 10000, employee_id = 'john_doe'}})
t.assert_equals(response.status, 200)
response = server:http_request('post', '/employee',
{json = {name = 'Jane Doe', department = 'Delivery', position = 'Developer',
salary = 20000, employee_id = 'jane_doe'}})
t.assert_equals(response.status, 200)
-- Making a GET request and checking if the output data is correct
response = server:http_request('get', '/employees?salary=15000.0')
t.assert_equals(response.status, 200)
t.assert_equals(response.json[1], {name = 'Jane Doe', department = 'Delivery', employee_id = 'jane_doe',
position = 'Developer', salary = 20000
})
end
Running the tests
If you had launched the application before
Stopping the application:
cartridge stop
Removing the directory containing the data:
rm -rf tmp/
Building the application and setting the dependencies:
cartridge build
./deps.sh
Running the linter:
.rocks/bin/luacheck .
Running the tests to record the coverage:
.rocks/bin/luatest --coverage
Generating the coverage reports and looking at the result:
.rocks/bin/luacov .
grep -A999 '^Summary' tmp/luacov.report.out
Running locally
To run applications locally, you can use cartridge-cli, but the roles we have written should be added to replicasets.yml:
router:
instances:
- router
roles:
- failover-coordinator
- app.roles.api
all_rw: false
s-1:
instances:
- s1-master
- s1-replica
roles:
- app.roles.storage
weight: 1
all_rw: false
vshard_group: default
s-2:
instances:
- s2-master
- s2-replica
roles:
- app.roles.storage
weight: 1
all_rw: false
vshard_group: default
To see the parameters of the configured instances, take a look at the instances.yml file.
Running the cluster locally:
cartridge build
cartridge start -d
cartridge replicasets setup --bootstrap-vshard
Now we can enter WebUI to load the roles' configuration and to configure the failover. To configure a stateful failover, do the following:
- click the Failover button
- choose
stateful
- specify the address and the password:
- localhost:4401
- passwd
Let's see how it works. Now the leader in the s-1
replica set is s1-master
.
Let's stop it:
cartridge stop s1-master
Now s1-replica
becomes the leader:
Let's restore s1-master
:
cartridge start -d s1-master
s1-master
is up again, but s1-replica
is still the leader because of the stateful failover:
Let's load the configuration for the cartridge.roles.metrics
role. To do this, switch to the Code tab and create the metrics.yml file with the following contents:
export:
- path: '/metrics'
format: prometheus
- path: '/health'
format: health
After we click the Apply button, the metrics will become available at each node of the application at the localhost:8081/metrics
endpoint. The health-check page at the localhost:8081/health
address will also appear.
This completes the basic setup of a small application: the cluster is ready to run and now we can write an application to communicate with the cluster using the HTTP API or via a connector. We can also expand the functionality of the cluster.
Conclusion
Many developers hate wasting time configuring a database. We prefer simply writing code and leaving cluster management to a framework. To solve this problem, I use Cartridge, a framework that manages a cluster containing several instances of a Tarantool database.
Now you know:
- how to build a reliable cluster application based on Cartridge and Tarantool,
- how to write the code for a small application to store information about employees,
- how to add tests,
- how to configure a cluster.
I hope my story was helpful and you will start using Cartridge to create applications. I would be glad to hear feedback on whether you managed to write a Cartridge application quickly and easily as well as questions about its use.
What's next?
- Check out the documentation on the official website
- Try Cartridge in the sandbox
- Ask your questions to the community in the Telegram chat
Posted on February 14, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.