hyper-scale multi-tenant for SaaS: an example with pgbench

franckpachot

Franck Pachot

Posted on October 4, 2021

hyper-scale multi-tenant for SaaS: an example with pgbench

I'll write soon about the alternatives for multi-tenancy on YugabyteDB, as this is something a lot of users are asking for (for SaaS - Software as a Service). The solutions are a balance between isolation and scalability. Here I'm showing the maximum scalability one, where a new tenant does not require any additional resources, because all is running on the same database, accessed by the same connection pool, sharing the same schema. But still isolated from a security point of view thanks to Row Level Security (RLS). I am showing an example on YugabyteDB. The same is running the same on PostgreSQL, but then you will have a limit in the scalability as you cannot add writer nodes.

In order to show how it can be transparent with a legacy application, I'll take the example of the builtin pgbench.

database, table owner and user

I create a "saas" database with two users:

  • "saas_owner" will own the tables in the public schema, and have access to all tenants
  • "saas_user" will have all DML grants to the tables, but restricted to one tenant
PGPASSWORD=franck PGHOST=yb1.pachot.net PGPORT=5433 PGUSER=yugabyte PGPASSWORD=yugabyte
psql
 create database saas;
 create user saas_owner;
 grant all on schema public to saas_owner;
 create user saas_user;
 grant all on schema public to saas_user;
 \c saas saas_owner

psql (15devel, server 11.2-YB-2.7.2.0-b0)
You are now connected to database "saas" as user "saas_owner".
Enter fullscreen mode Exit fullscreen mode

Now, connected as "saas_owner", I'll create the pgbench tables.

tables with "tenant_id"

In this solution, all tenant data will be stored into the same table. This is an advantage when you want to manage all as one. The isolation of data is ensured with an additional column "tenant_id". Of course you can name it as you want. This column will be added in front of the primary keys. In addition to that, I want this multi-tenancy to be transparent to the application, and then add a default value from a 'rls.tenant_id' session variable. Again, you can name it as you want. And using a session variable is just an option. The RLS condition is customizable.

I could create the tables with pgbench -i and add the "tenant_id" column later but I'll also have to add this column to the primary keys, and in front of all index column list. Then it is easier to write the full DDL:

\c saas saas_owner

CREATE TABLE pgbench_accounts (
 tenant_id int default current_setting('rls.tenant_id')::int not null,
 aid integer NOT NULL,
 bid integer,
 abalance integer,
 filler character(84),
 PRIMARY KEY(tenant_id,aid)
);
CREATE TABLE pgbench_branches (
 tenant_id int default current_setting('rls.tenant_id')::int not null,
 bid integer NOT NULL,
 bbalance integer,
 filler character(88),
 PRIMARY KEY(tenant_id,bid)
);
CREATE TABLE pgbench_history (
 tenant_id int default current_setting('rls.tenant_id')::int not null,
 tid integer,
 bid integer,
 aid integer,
 delta integer,
 mtime timestamp without time zone,
 filler character(22),
 PRIMARY KEY(tenant_id,tid,bid,aid)
);
CREATE TABLE pgbench_tellers (
 tenant_id int default current_setting('rls.tenant_id')::int not null,
 tid integer NOT NULL,
 bid integer,
 tbalance integer,
 filler character(84),
 PRIMARY KEY(tenant_id,tid,bid)
);

Enter fullscreen mode Exit fullscreen mode

The default current_setting('rls.tenant_id')::int will read the session value at INSERT time so that the multi-tenant application has only to set this variable when grabbing a connection from the connection pool. Then all inserts will automatically put data into the right tenant.

Note that you can use the current_user instead of a session variable in the case you want a different connection pool for each tenant. But connections have a limit and I'm presenting here the most scalable design that can work with millions of tenants if needed.

Row Level Security (RLS)

The default value in the column definition makes multi-tenancy transparent for inserts, for sessions setting the session variable, but doesn't enforce any security. PostgreSQL has a feature to enforce this, which also brings this transparency to other DML statements:

\c saas saas_owner

alter table pgbench_accounts enable row level security;
alter table pgbench_branches enable row level security;
alter table pgbench_history  enable row level security;
alter table pgbench_tellers  enable row level security;
Enter fullscreen mode Exit fullscreen mode

From this point, the "saas_owner" can still access to all rows but other users will not see anything, even when fully granted access:

\c saas saas_owner

grant select, insert, delete, update on pgbench_accounts to saas_user;
grant select, insert, delete, update on pgbench_branches to saas_user;
grant select, insert, delete, update on pgbench_history to saas_user;
grant select, insert, delete, update on pgbench_tellers to saas_user;

\c saas saas_user
saas=> insert into pgbench_branches values (0,0,0,0);
ERROR:  new row violates row-level security policy for table "pgbench_branches"

Enter fullscreen mode Exit fullscreen mode

The "saas_owner" has now to create policies to grant row-level access. Here I'll do that on this 'rls.tenant_id' session variable:

\c saas saas_owner

create policy tenant_id on pgbench_accounts for all to public using(tenant_id=current_setting('rls.tenant_id')::int);
create policy tenant_id on pgbench_branches for all to public using(tenant_id=current_setting('rls.tenant_id')::int);
create policy tenant_id on pgbench_history  for all to public using(tenant_id=current_setting('rls.tenant_id')::int);
create policy tenant_id on pgbench_tellers  for all to public using(tenant_id=current_setting('rls.tenant_id')::int);

Enter fullscreen mode Exit fullscreen mode

Now, let's see what the "saas_user" can do:

saas=> \c saas saas_user
psql (15devel, server 11.2-YB-2.7.2.0-b0)
You are now connected to database "saas" as user "saas_user".

saas=> set rls.tenant_id=0;
SET
saas=> insert into pgbench_branches values (0,0,0,0);
INSERT 0 1
saas=> set rls.tenant_id=42;
SET
saas=> insert into pgbench_branches values (42,0,0,0);
INSERT 0 1
saas=> delete from pgbench_branches returning *;
 tenant_id | bid | bbalance |                                          filler
-----------+-----+----------+------------------------------------------------------------------------------------------
        42 |   0 |        0 | 0
(1 row)

DELETE 1
saas=> set rls.tenant_id=0;
SET
saas=> delete from pgbench_branches returning *;
 tenant_id | bid | bbalance |                                          filler
-----------+-----+----------+------------------------------------------------------------------------------------------
         0 |   0 |        0 | 0
(1 row)

DELETE 1

Enter fullscreen mode Exit fullscreen mode

You see the power of it. In this example, the delete statement has deleted the row from the currently set 'rls.tenant_id'. If you have played with Oracle Multitenant with Application Containers, this looks like an ALTER SESSION SET CONTAINER=... isn't it? Let's see how to switch between tenants here.

PGOPTIONS

Again, using a session variable is my choice. You can put pretty everything in the RLS policy. Some will use current_user or current_schema, some will read from a function, read from an external system,... just be careful that this will be called by all statements.

The session variable will probably be set from the Session Factory that gets a connection from the connection pool. But here, as pgbench is a command line tool, I'll set the value with PGOPTIONS.

With the following, I'm generating values for tenant_id=0 with pgbench initialization:

PGOPTIONS="-c rls.tenant_id=0" pgbench -i -IG -U saas_owner -d saas

generating data (server-side)...
done in 7.23 s (server-side generate 7.23 s).

Enter fullscreen mode Exit fullscreen mode

I didn't use the "saas_user" here because pgbench starts with a TRUNCATE table and that is not an operation that can be done with RLS policies, as TRUNCATE would remove all rows

I'll initialize the other tenants by copying the tenant_id=0 data. This has also to be done as "saas_owner" because I have to read from another tenant:

for i in {1..9} ; do
   export PGOPTIONS
   PGOPTIONS="-c rls.tenant_id=$i" psql --quiet -U saas_owner -d saas -c "
   insert into pgbench_accounts(aid,bid,abalance,filler) select aid,bid,abalance,filler from pgbench_accounts where tenant_id=0;
   insert into pgbench_branches(bid,bbalance,filler) select bid,bbalance,filler from pgbench_branches where tenant_id=0;
   insert into pgbench_tellers(tid,bid,tbalance,filler) select tid,bid,tbalance,filler from pgbench_tellers where tenant_id=0;
  "&
  done ; time wait

real    0m40.226s
Enter fullscreen mode Exit fullscreen mode

This ran in parallel and is limited only by my small lab server. My goal is to publish numbers to compare the different multi-tenant solutions, on a real-size cluster, but that's for another post.

multi-tenant queries

With "saas_owner" I can see all data from all tenants:

\c saas saas_owner
select * from pgbench_branches

 tenant_id | bid | bbalance | filler
-----------+-----+----------+--------
         5 |   1 |        0 |
         1 |   1 |        0 |
         6 |   1 |        0 |
         7 |   1 |        0 |
         9 |   1 |        0 |
         4 |   1 |        0 |
         0 |   1 |        0 |
         2 |   1 |        0 |
         8 |   1 |        0 |
         3 |   1 |        0 |
(10 rows)

Enter fullscreen mode Exit fullscreen mode

The random order gives a clue about the physical storage. As the "tenant_id" is the first column of my primary key, the Yugabyte table is sharded on a hashed value. This is where YugabyteDB is the right platform for this. On a monolith database you don't want to store million of tenants in the same table. But, here, you can, because the table is split into tablets, and you have control on their number, and let the database balance them automatically.

I've one more thing to do as the "saas_owner", create the foreign keys:

\c saas saas_owner

alter table pgbench_tellers add constraint pgbench_tellers_bid_fkey foreign key (tenant_id,bid) references pgbench_branches;
alter table pgbench_accounts add constraint pgbench_accounts_bid_fkey foreign key (tenant_id,bid) references pgbench_branches;
alter table pgbench_history add constraint pgbench_history_bid_fkey foreign key (tenant_id,bid) references pgbench_branches;
alter table pgbench_history add constraint pgbench_history_tid_fkey foreign key (tenant_id,tid) references pgbench_tellers;
alter table pgbench_history add constraint pgbench_history_aid_fkey foreign key (tenant_id,aid) references pgbench_accounts;

Enter fullscreen mode Exit fullscreen mode

Don't forget that all primary key, foreign key, and indexes will start with the "tenant_id". The beauty of the solution is that you add an index once and all tenants will benefit from it. You choose this multi-tenant-design when you have thousands of tenants that you don't want to manage individually.

single-tenant queries

In my example, the application is responsible to switch to the right tenant. I connect with the "saas_user" and show what can be seen when setting 'rls.tenant_id':

\c saas saas_user

psql (15devel, server 11.2-YB-2.7.2.0-b0)
You are now connected to database "saas" as user "saas_user".

-- no 'rls.tenant_id' set:

select * from pgbench_branches;

ERROR:  unrecognized configuration parameter "rls.tenant_id"

-- inexistant 'rls.tenant_id' set:

set rls.tenant_id=42;
SET

select * from pgbench_branches;

 tenant_id | bid | bbalance | filler
-----------+-----+----------+--------
(0 rows)

-- existing 'rls.tenant_id' set:

set rls.tenant_id=9;
SET

select * from pgbench_branches;

 tenant_id | bid | bbalance | filler
-----------+-----+----------+--------
         9 |   1 |        0 |
(1 row)

Enter fullscreen mode Exit fullscreen mode

Not setting the session variable raises an error, setting it to a non-existant tenant just show no rows, and setting it to a tenant filters on its rows.

RLS transparency

We have seen that administration operations done by "saas_owner" are easy as they don't have to manage each tenant individually. It is also easy for the application because, when connected as "saas_user" and with 'rls.tenant_id' set, the application can run the same code as before going multi-tenant.

Here is how I run pgbench on tenant_id=9 just by setting an environment variable:

PGOPTIONS="-c rls.tenant_id=9" pgbench -n -U saas_user saas -T 10
Enter fullscreen mode Exit fullscreen mode

Setting the "tenant_id" for the session is the only thing you have to do. The application reads from the common tables, but the RLS policy restricts the rows, as a Virtual Private Database.

This is done by adding an implicit WHERE clause to the statements, and that's the reason why you should add the "tenant_id" to all indexes. From this example, I've gathered the queries that were run, from pg_stat_statements.query, and verified the execution plan to be sure that the indexes are used:

\c saas saas_user

set rls.tenant_id=9;

postgres=# explain analyze select count(*) from pgbench_branches;

                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=16.25..16.26 rows=1 width=8) (actual time=0.412..0.412 rows=1 loops=1)
   ->  Index Scan using pgbench_branches_pkey on pgbench_branches  (cost=0.00..16.00 rows=100 width=0) (actual time=0.406..0.408 rows=1 loops=
1)
         Index Cond: (tenant_id = (current_setting('rls.tenant_id'::text))::integer)
 Planning Time: 0.539 ms
 Execution Time: 0.902 ms
(5 rows)

postgres=# explain analyze update pgbench_accounts SET abalance = abalance + 1 WHERE aid = 2;

                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Update on pgbench_accounts  (cost=0.00..4.12 rows=1 width=768) (actual time=1.203..1.203 rows=0 loops=1)
   ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts  (cost=0.00..4.12 rows=1 width=768) (actual time=1.124..1.125 rows=1 loops=1
)
         Index Cond: ((tenant_id = (current_setting('rls.tenant_id'::text))::integer) AND (aid = 2))
 Planning Time: 4.667 ms
 Execution Time: 12.233 ms
(5 rows)

Enter fullscreen mode Exit fullscreen mode

The 'Index Cond' is explicit. You see exactly what is executed after the RLS policy is applied and then be sure that the indexes are correctly used.

partitions and geo-distribution

Having all rows stored in the same table is convenient. But maybe you want to physically isolate a few of them. The 80-20 rule often applies here: 80% of small tenants are stored together and 20% of them are important enough to be managed more carefully.

Having one schema to store all tenants is not a problem in a PostgreSQL compatible database because tables can be partitioned. Here is an example replacing the "pgbench_accounts" created above by a partitioned table where the "tenant_id" 1 and 2 are stored in a different partition:

\c saas saas_owner

alter table pgbench_accounts rename to tmp_pgbench_accounts ;

CREATE TABLE pgbench_accounts (
 tenant_id int default current_setting('rls.tenant_id')::int not null,
 aid integer NOT NULL,
 bid integer,
 abalance integer,
 filler character(84),
 PRIMARY KEY(tenant_id,aid)
) partition by list (tenant_id );

CREATE TABLE pgbench_accounts_vip 
partition of pgbench_accounts  (
 tenant_id, aid, bid, abalance , filler
) for values in (1,2);

CREATE TABLE pgbench_accounts_others 
partition of pgbench_accounts  (
 tenant_id, aid, bid, abalance , filler
) default;

Enter fullscreen mode Exit fullscreen mode

When running the same RLS policies with this table definition, here are the execution plans for two tenants that are in a different partition:

saas=> \c saas saas_user;
psql (15devel, server 11.2-YB-2.7.2.0-b0)
You are now connected to database "saas" as user "saas_user".

saas=> set rls.tenant_id=1;
SET

saas=> explain analyze select count(*) from pgbench_accounts;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=235.00..235.01 rows=1 width=8) (actual time=5.553..5.553 rows=1 loops=1)
   ->  Append  (cost=0.00..230.00 rows=2000 width=0) (actual time=5.549..5.549 rows=0 loops=1)
         Subplans Removed: 1
         ->  Seq Scan on pgbench_accounts_vip  (cost=0.00..110.00 rows=1000 width=0) (actual time=5.548..5.548 rows=0 loops=1)
               Filter: (tenant_id = (current_setting('rls.tenant_id'::text))::integer)
 Planning Time: 1.294 ms
 Execution Time: 5.638 ms
(7 rows)

saas=>
saas=> set rls.tenant_id=0;
SET
saas=> explain analyze select count(*) from pgbench_accounts;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=235.00..235.01 rows=1 width=8) (actual time=853.215..853.215 rows=1 loops=1)
   ->  Append  (cost=0.00..230.00 rows=2000 width=0) (actual time=9.033..843.702 rows=100000 loops=1)
         Subplans Removed: 1
         ->  Seq Scan on pgbench_accounts_others  (cost=0.00..110.00 rows=1000 width=0) (actual time=9.032..832.047 rows=100000 loops=1)
               Filter: (tenant_id = (current_setting('rls.tenant_id'::text))::integer)
 Planning Time: 0.992 ms
 Execution Time: 853.290 ms
(7 rows)
Enter fullscreen mode Exit fullscreen mode

The beauty of it is that a Seq Scan can still be relevant here because it scans only the required partition.

In YugabyteDB, you can go further. Partitions can be mapped to tablespaces, and tablespaces to nodes. All this is described in the Row Level Geo-Partitioning documentation. This means that you can control the geo-location of the tenants, or group of tenants, if you want to. This means that you can manage all your tenants in one schema, but still control their physical location.

views

For the moment, the "tenant_id" is visible when querying the table from the saas_user:


saas=> set rls.tenant_id=9;
SET

saas=> select * from pgbench_accounts limit 10;

 tenant_id | aid | bid | abalance |                                        filler
-----------+-----+-----+----------+--------------------------------------------------------------------------------------
         9 |   1 |   1 |        0 |
         9 |   2 |   1 |        2 |
         9 |   3 |   1 |        0 |
         9 |   4 |   1 |        0 |
         9 |   5 |   1 |        0 |
         9 |   6 |   1 |        0 |
         9 |   7 |   1 |        0 |
         9 |   8 |   1 |        0 |
         9 |   9 |   1 |        0 |
         9 |  10 |   1 |        0 |
(10 rows)

Enter fullscreen mode Exit fullscreen mode

This should not be a problem as your application lists the columns. select * is handy when querying interactively, but to be avoided in the application because the schema may change.

Anyway, if you want full transparency, you can create a view on the tables, without the "tenant_id" column.

\c saas postgres
create schema saas_user;
grant create on schema saas_user to saas_user;
\c saas saas_user
create view saas_user.pgbench_accounts as select aid, bid, abalance, filler from public.pgbench_accounts;
Enter fullscreen mode Exit fullscreen mode

Given that the view is first in the "search_path", this is what will be seen by the queries:

saas=> show search_path;
   search_path
-----------------
 "$user", public
(1 row)

saas=> set rls.tenant_id=9;
SET

saas=> select * from pgbench_accounts limit 10;
 aid | bid | abalance |                                        filler
-----+-----+----------+--------------------------------------------------------------------------------------
   1 |   1 |        0 |
   2 |   1 |        2 |
   3 |   1 |        0 |
   4 |   1 |        0 |
   5 |   1 |        0 |
   6 |   1 |        0 |
   7 |   1 |        0 |
   8 |   1 |        0 |
   9 |   1 |        0 |
  10 |   1 |        0 |
(10 rows)
Enter fullscreen mode Exit fullscreen mode

⚠️ Note that if you create the view from the saas_owner, the policy will not be applied because the vue uses the definer privileges. This is why I created it as another user.

Optimizer statistics

Now that you may have a distribution of data that is very different across tenants, you may want to gather statistics to have histograms on the tenant_id:

\c saas saas_owner

analyze pgbench_accounts;
analyze pgbench_branches;
analyze pgbench_history;
analyze pgbench_tellers;

saas=> \x
Expanded display is on.
saas=>
select * from pg_stats where attname='tenant_id';
-[ RECORD 1 ]----------+------------------------------------------
schemaname             | public
tablename              | pgbench_accounts
attname                | tenant_id
inherited              | f
null_frac              | 0
avg_width              | 4
n_distinct             | 4
most_common_vals       | {2,8,3,0}
most_common_freqs      | {0.279533,0.2726,0.266567,0.1813}
histogram_bounds       |
correlation            | 0.252514
most_common_elems      |
most_common_elem_freqs |
elem_count_histogram   |
-[ RECORD 2 ]----------+------------------------------------------
schemaname             | public
tablename              | pgbench_branches
attname                | tenant_id
inherited              | f
null_frac              | 0
avg_width              | 4
n_distinct             | -1
most_common_vals       |
most_common_freqs      |
histogram_bounds       | {0,1,2,3,4,5,6,7,8,9}
correlation            | 0.515152
most_common_elems      |
most_common_elem_freqs |
elem_count_histogram   |
-[ RECORD 3 ]----------+------------------------------------------
schemaname             | public
tablename              | pgbench_tellers
attname                | tenant_id
inherited              | f
null_frac              | 0
avg_width              | 4
n_distinct             | 10
most_common_vals       | {0,1,2,3,4,5,6,7,8,9}
most_common_freqs      | {0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1}
histogram_bounds       |
correlation            | 0.519952
most_common_elems      |
most_common_elem_freqs |
elem_count_histogram   |
Enter fullscreen mode Exit fullscreen mode

Knowing how I built the data, by copying the same into the 10 tenants, it looks like the sample size should be increased.

Note that there may be some correlation between the tenant_id and another column. For example, if your tenants are located in specific countries, there is a high correlation between one customer country and the tenant_id. You may want to CREATE STATISTICS on the column group to have the query planner aware of it. We do not support this yet in YugabyteDB. ANALYZE was just introduced in 2.9 recently. Anyway, without per-tenant partitioning, we should aim at Index Scan to benefit from sharding.

This post shows the RLS solution used in a practical way, with maximum transparency for the application. I'll write soon about the other solutions: isolation per schema, database, cluster, and their limits on resource consumption. However, in all cases, the solution described here can still be split into multiple schemas, with some dedicated to one large tenant and others consolidating many tenants into one. Then having this "tenant_id" column may always be useful. Even if it is hidden behind views, default value and RLS policies, one thing remains: convince the developers that a composite key is not evil. I'll talk about it at the Russian Java User Group Joker Conf

💖 💪 🙅 🚩
franckpachot
Franck Pachot

Posted on October 4, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related