Physically isolating tenants with tablespaces and smart drivers in YugabyteDB
Franck Pachot
Posted on August 27, 2024
When designing a multitenant application, whether your tenants are logically separated as different databases, schemas, or simply different rows in one table, you may consider physical isolation for performance and security reasons. This helps avoid resource contention and ensures that one tenant's data is contained and inaccessible to others.
In a distributed database like YugabyteDB, data is distributed and replicated across multiple nodes, making it challenging to assign specific data to a particular server. However, nodes can be tagged with zone, region, and cloud, which are helpful for cloud deployments but can also be used for racks and data centers for on-premises deployments or even for a set of physical servers. These tags enable the mapping of data storage and processing to specific parts of your cluster, facilitating physical isolation, and their name is abstract.
To constrain data placement, you need to create tablespaces. In monolithic databases, tablespaces store data in a specific filesystem. In cloud-native databases like YugabyteDB tablespaces define the placement blocks for distributing and replicating data.
If you have one database per tenant, you can set the default tablespace for the database and its owner. Each tenant may have a dedicated user as the database owner, and you can grant the creation of tables and indexes in a specific tablespace to secure the relationship between a tenant, its database owner, and its object tablespace.
For tenants in a single table, where row-level security is used for logical isolation, you can partition the table and assign a tablespace per partition. This ensures that each tenant's data is constrained to specific servers belonging to a zone, region, or cloud.
Although the data reads and writes are distributed, the query layer that processes SQL queries uses resources and may expose data in memory or log files. It's crucial to ensure that one tenant connects only to specific nodes. The application can manage this by starting a connection pool per tenant with a list of hosts or by utilizing YugabyteDB smart drivers, which automatically discover nodes and allow you to define a zone, region, or cloud to connect to.
We have demonstrated all this in the following Community Open Hours:
Here is the script I used for my demo:
Start a three nodes YugabyteDB cluster with one node on each region (region1
, region2
, region3
):
git clone https://github.com/FranckPachot/yb-compose
cd yb-compose
docker compose up -d
As I want high availability for each tenant, I scale to nine nodes, with three nodes per region:
docker compose up -d --scale yb=9 --no-recreate
I can connect to any node using the docker-compose load balancing to the service:
docker compose run -it pg psql -h yb
I create two databases for two tenants:
create database morning_brew_co;
create database wild_river_brewery;
I create a user for each, with no privileges on others:
CREATE ROLE wild_river_role WITH LOGIN PASSWORD 'password';
ALTER DATABASE wild_river_brewery OWNER TO wild_river_role;
REVOKE CONNECT ON DATABASE wild_river_brewery FROM PUBLIC;
GRANT CONNECT ON DATABASE wild_river_brewery TO wild_river_role;
CREATE ROLE morning_brew_role WITH LOGIN PASSWORD 'password';
ALTER DATABASE morning_brew_co OWNER TO morning_brew_role;
REVOKE CONNECT ON DATABASE morning_brew_co FROM PUBLIC;
GRANT CONNECT ON DATABASE morning_brew_co TO morning_brew_role;
I create one tablespace per region, to isolate data to their physical servers, and with replication within the region for high availability:
CREATE TABLESPACE region1 WITH (
replica_placement='{"num_replicas": 3, "placement_blocks": [
{"cloud":"cloud","region":"region1","zone":"zone","min_num_replicas":3}
]}');
CREATE TABLESPACE region2 WITH (
replica_placement='{"num_replicas": 3, "placement_blocks": [
{"cloud":"cloud","region":"region2","zone":"zone","min_num_replicas":3}
]}');
CREATE TABLESPACE region3 WITH (
replica_placement='{"num_replicas": 3, "placement_blocks": [
{"cloud":"cloud","region":"region3","zone":"zone","min_num_replicas":3}
]}');
"cloud", "region", and "zone" are keywords defined in YugabyteDB to tag where the server runs, but you can use them as you want. You can use "zone" for one set of servers, and "region" for data centers.
The minimum number of servers in a placement block depends on the replication factor. It is not recommended to run unprotected, like with replication factor 1, and that's why the topology doesn't provide a "server" tag.
I assign tablespaces to my tenant databases, users, and grant privileges:
alter database morning_brew_co set default_tablespace = region1;
alter user morning_brew_role set default_tablespace = region1;
grant create on tablespace region1 to morning_brew_role;
alter database wild_river_brewery set default_tablespace = region3;
alter user wild_river_role set default_tablespace = region3;
grant create on tablespace region3 to wild_river_role ;
For the demo, I create a table in one tenant, connecting with the right user to the right database:
\c wild_river_brewery wild_river_role
create table product(
id serial primary key,
title text,
description text,
price numeric(10,2),
quantity int
);
\d product
I can perform this operation from any node for DDL, but for DML, I must connect only to the nodes in one region. I can utilize a smart driver or do the same manually using psql
:
select yb_server_region(), current_setting('listen_addresses');
select host, cloud, region, zone from yb_servers() order by 2,3,4
;
\! set | grep ^PGLOADBALANCE
select string_agg(host,',') as hostlist from yb_servers()
where region='region3'
;
\gset
\c wild_river_brewery wild_river_role :hostlist
select yb_server_region(), current_setting('listen_addresses');
This retrieves the list of nodes for a single region and uses it to establish connections (round-robin because I've configured PGLOADBALANCE to random).
The YugabyteDB smart drivers, which are forks of the PostgreSQL drivers made cluster-aware, automatically handle node discovery after the first connection by querying the yb_servers() view. They provide additional connection parameters to load balance and restrict to a cloud topology in terms of cloud, region and zone.
I can insert data now that I'm safely connected to the right set of servers, and guaranteed that data will be written only there:
INSERT INTO product (title, description, price, quantity)
VALUES ('River Pale Ale', 'A smooth and hoppy pale ale with citrus notes, brewed by Wild River Brewery.'
, 6.50, 200);
\x
select * from product;
\q
To verify that no data is visible on the other servers, I connect to each container and grep for one work (I didn't enable encryption):
for host in yb-compose-yb-{1..9}
do
docker exec -it $host bash -c 'cd /root/var/data/yb-data/tserver ; grep -lr "citrus" . | sed -e "s/^/$(hostname -i): /" '
done
for host in yb-compose-yb-{1..9}
do
docker exec -it $host bash -c "yb-ts-cli -server_address $host:9100 flush_all_tablets"
done
for host in yb-compose-yb-{1..9}
do
docker exec -it $host bash -c 'cd /root/var/data/yb-data/tserver ; grep -lr "citrus" . | sed -e "s/^/$(hostname -i): /" '
done
To show how to do with single-table multi-tenancy, I create another database:
docker compose run -it pg psql -h yb
create database multitenant;
\c multitenant
I create a partitioned table and assign a partition to one region:
create table product(
primary key (tenant, id),
tenant int default current_setting('app.tenant')::int,
id serial,
title text,
description text,
price numeric(10,2),
quantity int
) partition by list(tenant);
create table product_region2
partition of product
for values in (2,22,222)
tablespace region2;
The tenant id is assigned with a session variable that the application can set before executing its queries:
set app.tenant=2;
INSERT INTO product (title, description, price, quantity)
VALUES ('Bush Beer', ' strong Belgian ale with a rich amber color, brewed by Brasserie Dubuisson'
, 6.50, 200);
\q
Even if there is no direct mapping to physical hosts because it is distributed and designed to be resilient to failures, YugabyteDB allows you to use geo-partitioning techniques to allocate SQL processing and data storage to a specific set of hosts. This is particularly useful for multitenancy, as it provides for the physical isolation of some tenants.
Related documentation:
Control the placement of data using tablespaces
Manage connection load balancing automatically using smart drivers
Posted on August 27, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.