Aurora Limitless - Collocation (and porting PgBench)

Many people believe that implementing sharding on PostgreSQL-compatible databases ensures complete compatibility with PostgreSQL. However, sharding serves as an additional layer, and we have identified some limitations in the previous posts of this series.
As a result, you cannot run any existing PostgreSQL application on Aurora Limitless without modifying the data model.

To illustrate this, let's run the simple PgBench table creation. If you try to run pgbench -- initialize, you will encounter many errors. I'll summarize them by running each step of the initialization from pgbench -Idtpfg.

Table creation with pgbench -iIt failed with:

creating tables...
pgbench: error: query failed: ERROR:  STORAGE PARAMETER is not supported
pgbench: detail: Query was: create table pgbench_tellers(tid int not null,bid int,tbalance int,filler char(84)) with (fillfactor=100)

With PostgreSQL heap tables, the fill factor can reserve some empty space on each page to prevent the migration of new row versions to a different page, which would require updating all indexes. However, this feature is not supported in Aurora Limitless. A workaround for this limitation is to remove the storage clause.

Anyway, as we have seen in the previous posts, I cannot run the DDL as-is and must define which tables are sharded or reference. I create pgbench_accounts and pgbench_history with sharding on the account identifier aid with:

set rds_aurora.limitless_create_table_mode=sharded;
set rds_aurora.limitless_create_table_shard_key='{"aid"}';

The pgbench_branches and pgbench_tellers do not have a sharding key and I created them as reference tables with:

set rds_aurora.limitless_create_table_mode=reference;

Note that I decided to distribute the accounts. If the application was multitenant, like having thousands of branches, I could have sharded all tables on the branch bid.

Sharding must be part of the upfront design when going to Aurora Limitless, and you must know how it will have to scale.

Once the tables were created, the primary key creation with pgbench -iIp failed with:

pgbench: error: query failed: ERROR:  relation "public.pgbench_branches" must have PRIMARY KEY to be transformed into a reference table
pgbench: error: query failed: ERROR:  relation "public.pgbench_tellers" must have PRIMARY KEY to be transformed into a reference table

Because all changes to the reference tables must be replicated to all shards, they must have a primary key. This is similar to logical replication. With Aurora Limitless, you cannot create the heap table first and add the primary key later. The workaround is to add the PRIMARY KEY clause to the CREATE TABLE.

Once the tables are created with the primary key, the foreign key creation with pgbench -iIf failed with:

pgbench: error: query failed: ERROR:  FOREIGN KEY between SHARDED TABLES supported only for COLLOCATED TABLES
DETAIL:  Table public.pgbench_history and public.pgbench_accounts are not collocated
pgbench: detail: Query was: alter table pgbench_history add constraint pgbench_history_aid_fkey foreign key (aid) references pgbench_accounts

Creating tables with the same sharding key is not enough. The tables must be colocated to enable all SQL features, including referential integrity. This ensures that they remain together if the slices are moved between shards. Referential integrity operates within a single shard, so foreign keys between the tables must be maintained within the same shard.

The solution is to create pgbench_history with colocation set to pgbench_accounts:

set rds_aurora.limitless_create_table_mode=sharded;
set rds_aurora.limitless_create_table_shard_key='{"aid"}';
set rds_aurora.limitless_create_table_collocate_with="pgbench_accounts";

The client-side generation of data with pgbench -iIg failed with:

ERROR:  cannot perform COPY FREEZE on a partitioned table
pgbench: error: PQendcopy failed

The FREEZE option, which sets the hint bits to avoid overhead on the next reads, is unsupported. The solution is to use server-side generation with pgbench -iIG, which uses INSERT instead of COPY.

Finally, dropping the tables with pgbench -iId failed with:

ERROR:  failed to execute remote query with message: cannot drop table pgbench_accounts because other objects depend on it
DETAIL:  constraint pgbench_history_aid_fkey on table pgbench_history depends on table pgbench_accounts
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

In PostgreSQL, dropping all tables within the same statement should be allowed as it is atomic and doesn't break the referential integrity. The workaround is to add CASCADE in the DROP statement.

Finally, once all tables are created, the default pgbench fails with:

pgbench (16.2, server 16.4)
pgbench: error: unexpected partition method: "H"

PgBench detects the partitioning method by querying partstrat from pg_partitioned_table and expects h for HASH and r for RANGE. Aurora Limitless returns an uppercase H that is not recognized by PgBench.
Note that you cannot create partitioned tables in Aurora Limitless, like pgbench -i --partition-method=HASH --partitions 2, because partitioning is already used internally by the routers and shards. The workaround is to generate the scripts and skip the PgBench partition checking:

pgbench --show-script=tpcb 2> tpcb.sql
pgbench --show-script=simple-update 2> simple-update.sql

simple-update.sql is the builtin "simple update" that updates a random account balance and inserts the history:

\set aid random(1, 100000 * :scale)
\set bid random(1, 1 * :scale)
\set tid random(1, 10 * :scale)
\set delta random(-5000, 5000)
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
END;

tpcb.sql is the builtin "TPC-B (sort of)" that does the same and also updates the total balance for the teller and branch:

-- tpcb-like: <builtin: TPC-B (sort of)>
\set aid random(1, 100000 * :scale)
\set bid random(1, 1 * :scale)
\set tid random(1, 10 * :scale)
\set delta random(-5000, 5000)
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
END;

Here is the re-written DDL (Data Definition Language to create the table) and DML (Data Manipulation Language run by application) to initialize the PgBench tables in Aurora Limitless:

\c

drop table if exists pgbench_branches, pgbench_tellers, pgbench_accounts, pgbench_history cascade;

-- sharding "pgbench_accounts" on "aid"
set rds_aurora.limitless_create_table_mode=sharded;
set rds_aurora.limitless_create_table_shard_key='{"aid"}';

CREATE TABLE public.pgbench_accounts ( 
 aid integer NOT NULL primary key
 , bid integer, abalance integer, filler character(84)
) ;

-- collocated "pgbench_history" with "pgbench_accounts"
set rds_aurora.limitless_create_table_collocate_with="pgbench_accounts";

CREATE TABLE public.pgbench_history (
 tid integer, bid integer, aid integer, delta integer, mtime timestamp without time zone, filler character(22)
);

-- create "pgbench_branches" and "pgbench_tellers" as reference tables
set rds_aurora.limitless_create_table_mode=reference;

CREATE TABLE public.pgbench_branches ( 
 bid integer NOT NULL primary key
 , bbalance integer, filler character(88)
) ;

CREATE TABLE public.pgbench_tellers ( 
 tid integer NOT NULL primary key
 , bid integer, tbalance integer, filler character(84)
) ;

-- add foreign keys and generate data
\! pgbench -iIGf

-- generate the scripts
\! pgbench --show-script=tpcb 2> tpcb.sql
\! pgbench --show-script=simple-update 2> simple-update.sql

I will run both workloads using ten connections for 30 seconds, displaying the throughput every five seconds.


\! pgbench -c 10 -P 5 -T 30 -r -f tpcb.sql
\! pgbench -c 10 -P 5 -T 30 -r -f simple-update.sql

Here is the TPC-B workload:

postgres_limitless=> \! pgbench -c 10 -P 5 -T 30 -r -f tpcb.sql
pgbench (16.2, server 16.4)
starting vacuum...end.
progress: 5.0 s, 5.4 tps, lat 771.930 ms stddev 323.045, 0 failed
progress: 10.0 s, 12.0 tps, lat 814.262 ms stddev 313.988, 0 failed
progress: 15.0 s, 11.4 tps, lat 875.883 ms stddev 425.352, 0 failed
progress: 20.0 s, 12.6 tps, lat 832.509 ms stddev 443.109, 0 failed
progress: 25.0 s, 12.6 tps, lat 780.567 ms stddev 350.805, 0 failed
progress: 30.0 s, 12.4 tps, lat 805.138 ms stddev 508.859, 0 failed
transaction type: tpcb.sql
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
maximum number of tries: 1
duration: 30 s
number of transactions actually processed: 342
number of failed transactions: 0 (0.000%)
latency average = 817.535 ms
latency stddev = 406.085 ms
initial connection time = 2415.585 ms
tps = 12.085738 (without initial connection time)
statement latencies in milliseconds and failures:
         0.003           0  \set aid random(1, 100000 * :scale)
         0.001           0  \set bid random(1, 1 * :scale)
         0.001           0  \set tid random(1, 10 * :scale)
         0.001           0  \set delta random(-5000, 5000)
        32.680           0  BEGIN;
        37.159           0  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
        35.044           0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
       284.577           0  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
       347.542           0  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
        35.407           0  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
        45.119           0  END;
postgres_limitless=>

The throughput is low, and the latency comes from updating pgbench_tellers and pgbench_branches, which were declared as reference tables and then broadcasted to all nodes. Database sharding is inefficient when tables that are written frequently lack a common sharding key.

The simple update workload works better:

postgres_limitless=> \! pgbench -c 10 -P 5 -T 30 -r -f simple-update.sql
pgbench (16.2, server 16.4)
starting vacuum...end.
progress: 5.0 s, 26.4 tps, lat 187.265 ms stddev 29.313, 0 failed
progress: 10.0 s, 55.8 tps, lat 179.346 ms stddev 25.351, 0 failed
progress: 15.0 s, 53.0 tps, lat 188.055 ms stddev 32.336, 0 failed
progress: 20.0 s, 56.0 tps, lat 178.569 ms stddev 16.320, 0 failed
progress: 25.0 s, 56.8 tps, lat 175.783 ms stddev 12.029, 0 failed
progress: 30.0 s, 57.6 tps, lat 173.895 ms stddev 9.150, 0 failed
transaction type: simple-update.sql
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
maximum number of tries: 1
duration: 30 s
number of transactions actually processed: 1538
number of failed transactions: 0 (0.000%)
latency average = 179.656 ms
latency stddev = 22.099 ms
initial connection time = 2446.539 ms
tps = 55.493275 (without initial connection time)
statement latencies in milliseconds and failures:
         0.002           0  \set aid random(1, 100000 * :scale)
         0.001           0  \set bid random(1, 1 * :scale)
         0.001           0  \set tid random(1, 10 * :scale)
         0.001           0  \set delta random(-5000, 5000)
        33.382           0  BEGIN;
        37.896           0  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
        35.332           0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
        35.802           0  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
        37.241           0  END;
postgres_limitless=>

The simple update workload is scalable on Aurora Limitless because all tables subject to DML (Data Manipulation Language) belong to a single shard. This is straightforward when examining the code: all statements utilize the same :aid, the shard key.

This example using PgBench effectively illustrates Aurora Limitless's limitations. It is usable for applications designed for sharding, a challenge not unique to Aurora, as Microsoft Citus and Oracle Sharding exhibit the same limitations. This distinction highlights the difference between sharded databases and distributed SQL databases.

To analyze a sharding-friendly application, you can use PgBench with multiple branches and utilize the branch identifier (bid) as a sharding key, which is included in all tables. This approach is particularly effective if you have numerous branches and the accounts and activities are evenly distributed among them. For example, you can create a hundred branches by running the command pgbench -i --scale 100.
The PgBench schema is designed this way because it is based on TPC benchmarks from 30 years ago, at a time when sharding was the only solution for scaling databases. This approach still holds relevance for multitenant applications today. However, there may be instances of certain shards being significantly larger than others. Since Aurora Limitless utilizes Aurora Serverless, resource utilization can still be balanced effectively, even if the workload is not evenly distributed.

Aurora Serverless simplifies sharding operations and offers a significant advantage over Citus: even though multi-shard transactions may not be optimal, they are ACID-compliant. The next post will show this using the PgBench tables.

Blog

Aurora Limitless - Collocation (and porting PgBench)

Franck Pachot

Join Our Newsletter. No Spam, Only the good stuff.

Related