Franck Pachot
Posted on November 25, 2024
In the previous posts, I created the 16 ACU cluster, which consists of two routers and two shards. Now, let's create a sharded table and observe how it is distributed.
Before running a PostgreSQL-compatible DDL to create the table, I defined the sharded mode and the sharding key:
postgres_limitless=> set rds_aurora.limitless_create_table_mode=sharded;
SET
postgres_limitless=> set rds_aurora.limitless_create_table_shard_key='{"id"}';
SET
postgres_limitless=> \timing on
Timing is on.
postgres_limitless=> create table my_sharded_table( id bigserial primary key, value int );
CREATE TABLE
Time: 1613.673 ms (00:01.614)
postgres_limitless=>
This appears to be a partitioned table with remote partitions, accessed through the Foreign Data Wrapper:
postgres_limitless=> \d
List of relations
Schema | Name | Type | Owner
--------+--------------------------+-------------------+-----------------------------------
public | my_sharded_table | partitioned table | postgres
public | my_sharded_table_fs00001 | foreign table | rds_aurora_limitless_tables_owner
public | my_sharded_table_fs00002 | foreign table | rds_aurora_limitless_tables_owner
public | my_sharded_table_id_seq | sequence | postgres
(4 rows)
postgres_limitless=> \d my_sharded_table
Partitioned table "public.my_sharded_table"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+----------------------------------------------
id | bigint | | not null | nextval('my_sharded_table_id_seq'::regclass)
value | integer | | |
Partition key: HASH (id)
Number of partitions: 2 (Use \d+ to list them.)
postgres_limitless=> \d+ my_sharded_table
Partitioned table "public.my_sharded_table"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+----------------------------------------------+---------+-------------+--------------+-------------
id | bigint | | not null | nextval('my_sharded_table_id_seq'::regclass) | plain | | |
value | integer | | | | plain | | |
Partition key: HASH (id)
Partitions: my_sharded_table_fs00001 FOR VALUES FROM (MINVALUE) TO ('0'), FOREIGN,
my_sharded_table_fs00002 FOR VALUES FROM ('0') TO (MAXVALUE), FOREIGN
The last two lines of \d+
are crucial for understanding how Aurora Limitless operates. While it appears to use hash partitioning (with the partition key set to HASH), the actual partitions are based on ranges defined by the FOR VALUES FROM ... TO clause. These partitions are virtual and are associated with the Foreign Data Wrapper (FOREIGN).
A hash function is applied to the sharding key, which must be included as part of the primary key, like with PostgreSQL partitioning. Each remote database is then assigned a specific range of hash values.
Each foreign table partition references a server, _rds_aurora_internal_shard_4_0
and _rds_aurora_internal_shard_5_0
, which are internal references for the shard nodes with sumbcluster_id 4 and 5.
postgres_limitless=> \d my_sharded_table_fs00001
Foreign table "public.my_sharded_table_fs00001"
Column | Type | Collation | Nullable | Default | FDW options
--------+---------+-----------+----------+---------+-------------
id | bigint | | not null | |
value | integer | | | |
Partition of: my_sharded_table FOR VALUES FROM (MINVALUE) TO ('0')
Server: _rds_aurora_internal_shard_5_0
FDW options: (schema_name 'public', table_name 'my_sharded_table', aurora_table_type 'sharded')
postgres_limitless=> \d my_sharded_table_fs00002
Foreign table "public.my_sharded_table_fs00002"
Column | Type | Collation | Nullable | Default | FDW options
--------+---------+-----------+----------+---------+-------------
id | bigint | | not null | |
value | integer | | | |
Partition of: my_sharded_table FOR VALUES FROM ('0') TO (MAXVALUE)
Server: _rds_aurora_internal_shard_4_0
FDW options: (schema_name 'public', table_name 'my_sharded_table', aurora_table_type 'sharded')
postgres_limitless=> select distinct dest_subcluster,dest_type,dest_instance_az
from rds_aurora.limitless_stat_subclusters
natural join (select subcluster_id as dest_subcluster, subcluster_type as dest_type from rds_aurora.limitless_subclusters)
where dest_subcluster != orig_subcluster order by 1
;
dest_subcluster | dest_type | dest_instance_az
-----------------+-----------+------------------
2 | router | eu-west-1a
3 | router | eu-west-1b
4 | shard | eu-west-1a
5 | shard | eu-west-1b
(4 rows)
Having only two partitions would make it difficult to re-shard when scaling out, as splitting them would be an expensive operation that affects online activity. The two partitions we observe are logical from the router's perspective. Within each shard, the remote tables are further partitioned. This can be seen when we run ANALYZE in verbose mode.
postgres_limitless=> analyze verbose my_sharded_table;
INFO: analyzing "public.my_sharded_table" inheritance tree
INFO: analyzing "public.my_sharded_table_ts00001"
INFO: "my_sharded_table_ts00001": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
INFO: analyzing "public.my_sharded_table_ts00002"
INFO: "my_sharded_table_ts00002": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
INFO: analyzing "public.my_sharded_table_ts00003"
INFO: "my_sharded_table_ts00003": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
INFO: analyzing "public.my_sharded_table_ts00004"
INFO: "my_sharded_table_ts00004": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
INFO: analyzing "public.my_sharded_table_ts00005"
INFO: "my_sharded_table_ts00005": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
INFO: analyzing "public.my_sharded_table_ts00006"
INFO: "my_sharded_table_ts00006": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
INFO: analyzing "public.my_sharded_table_ts00007"
INFO: "my_sharded_table_ts00007": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
INFO: analyzing "public.my_sharded_table_ts00008"
INFO: "my_sharded_table_ts00008": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
...
INFO: analyzing "public.my_sharded_table_ts00254"
INFO: "my_sharded_table_ts00254": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
INFO: analyzing "public.my_sharded_table_ts00255"
INFO: "my_sharded_table_ts00255": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
INFO: analyzing "public.my_sharded_table_ts00256"
INFO: "my_sharded_table_ts00256": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
INFO: analyzing "public.my_sharded_table" inheritance tree
INFO: analyzing "public.my_sharded_table_ts00257"
INFO: "my_sharded_table_ts00257": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
INFO: analyzing "public.my_sharded_table_ts00258"
INFO: "my_sharded_table_ts00258": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
...
INFO: analyzing "public.my_sharded_table_ts00509"
INFO: "my_sharded_table_ts00509": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
INFO: analyzing "public.my_sharded_table_ts00510"
INFO: "my_sharded_table_ts00510": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
INFO: analyzing "public.my_sharded_table_ts00511"
INFO: "my_sharded_table_ts00511": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
INFO: analyzing "public.my_sharded_table_ts00512"
INFO: "my_sharded_table_ts00512": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
INFO: analyzing "public.my_sharded_table" inheritance tree
ANALYZE
Time: 215.724 ms
My table is physically divided into 512 table slices. I have two shards, each containing a range of the partition key along with 256 table slices. Aurora Limitless can rebalance these slices when the shard group scales horizontally. It may appear as though it is splitting the partitions to re-shard the table, but in reality, it is only moving the existing partitions.
Another way to visualize these details is by using the EXPLAIN command when scanning the entire table. By default, EXPLAIN only displays the execution plan on the router:
postgres_limitless=> show rds_aurora.limitless_explain_options;
rds_aurora.limitless_explain_options
--------------------------------------
single_shard_optimization
(1 row)
postgres_limitless=> explain (verbose) select * from my_sharded_table;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Append (cost=100.00..200.01 rows=2 width=12)
-> Async Foreign Scan on public.my_sharded_table_fs00001 my_sharded_table_1 (cost=100.00..100.00 rows=1 width=12)
Output: my_sharded_table_1.id, my_sharded_table_1.value
Remote SQL: SELECT id, value FROM public.my_sharded_table
-> Async Foreign Scan on public.my_sharded_table_fs00002 my_sharded_table_2 (cost=100.00..100.00 rows=1 width=12)
Output: my_sharded_table_2.id, my_sharded_table_2.value
Remote SQL: SELECT id, value FROM public.my_sharded_table
Query Identifier: 4166512535977862222
(8 rows)
By adding shard_plans
to limitless_explain_options
the plans on each shard are visible:
postgres_limitless=> set rds_aurora.limitless_explain_options=shard_plans, single_shard_optimization;
SET
postgres_limitless=> explain (verbose off) select * from my_sharded_table;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Append (cost=100.00..200.01 rows=2 width=12)
-> Async Foreign Scan on my_sharded_table_fs00001 my_sharded_table_1 (cost=100.00..100.00 rows=1 width=12)
Remote Plans from Shard 5:
Append (cost=0.00..1.28 rows=256 width=12)
-> Seq Scan on my_sharded_table_ts00001 my_sharded_table_1 (cost=0.00..0.00 rows=1 width=12)
-> Seq Scan on my_sharded_table_ts00002 my_sharded_table_2 (cost=0.00..0.00 rows=1 width=12)
-> Seq Scan on my_sharded_table_ts00003 my_sharded_table_3 (cost=0.00..0.00 rows=1 width=12)
-> Seq Scan on my_sharded_table_ts00004 my_sharded_table_4 (cost=0.00..0.00 rows=1 width=12)
-> Seq Scan on my_sharded_table_ts00005 my_sharded_table_5 (cost=0.00..0.00 rows=1 width=12)
-> Seq Scan on my_sharded_table_ts00006 my_sharded_table_6 (cost=0.00..0.00 rows=1 width=12)
-> Seq Scan on my_sharded_table_ts00007 my_sharded_table_7 (cost=0.00..0.00 rows=1 width=12)
-> Seq Scan on my_sharded_table_ts00008 my_sharded_table_8 (cost=0.00..0.00 rows=1 width=12)
...
-> Seq Scan on my_sharded_table_ts00252 my_sharded_table_252 (cost=0.00..0.00 rows=1 width=12)
-> Seq Scan on my_sharded_table_ts00253 my_sharded_table_253 (cost=0.00..0.00 rows=1 width=12)
-> Seq Scan on my_sharded_table_ts00254 my_sharded_table_254 (cost=0.00..0.00 rows=1 width=12)
-> Seq Scan on my_sharded_table_ts00255 my_sharded_table_255 (cost=0.00..0.00 rows=1 width=12)
-> Seq Scan on my_sharded_table_ts00256 my_sharded_table_256 (cost=0.00..0.00 rows=1 width=12)
-> Async Foreign Scan on my_sharded_table_fs00002 my_sharded_table_2 (cost=100.00..100.00 rows=1 width=12)
Remote Plans from Shard 4:
Append (cost=0.00..1.28 rows=256 width=12)
-> Seq Scan on my_sharded_table_ts00257 my_sharded_table_1 (cost=0.00..0.00 rows=1 width=12)
-> Seq Scan on my_sharded_table_ts00258 my_sharded_table_2 (cost=0.00..0.00 rows=1 width=12)
-> Seq Scan on my_sharded_table_ts00259 my_sharded_table_3 (cost=0.00..0.00 rows=1 width=12)
-> Seq Scan on my_sharded_table_ts00260 my_sharded_table_4 (cost=0.00..0.00 rows=1 width=12)
-> Seq Scan on my_sharded_table_ts00261 my_sharded_table_5 (cost=0.00..0.00 rows=1 width=12)
...
-> Seq Scan on my_sharded_table_ts00507 my_sharded_table_251 (cost=0.00..0.00 rows=1 width=12)
-> Seq Scan on my_sharded_table_ts00508 my_sharded_table_252 (cost=0.00..0.00 rows=1 width=12)
-> Seq Scan on my_sharded_table_ts00509 my_sharded_table_253 (cost=0.00..0.00 rows=1 width=12)
-> Seq Scan on my_sharded_table_ts00510 my_sharded_table_254 (cost=0.00..0.00 rows=1 width=12)
-> Seq Scan on my_sharded_table_ts00511 my_sharded_table_255 (cost=0.00..0.00 rows=1 width=12)
-> Seq Scan on my_sharded_table_ts00512 my_sharded_table_256 (cost=0.00..0.00 rows=1 width=12)
(519 rows)
Cross-shard queries are not optimal for performance, taking between 8 to 80 milliseconds for an empty table:
postgres_limitless=> set rds_aurora.limitless_explain_options=single_shard_optimization;
SET
postgres_limitless=> explain (analyze) select * from my_sharded_table
postgres_limitless-> \watch c=5
Sat 23 Nov 2024 12:01:09 AM GMT (every 2s)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Append (cost=100.00..200.01 rows=2 width=12) (actual time=6.170..6.172 rows=0 loops=1)
-> Async Foreign Scan on my_sharded_table_fs00001 my_sharded_table_1 (cost=100.00..100.00 rows=1 width=12) (actual time=2.468..2.468 rows=0 loops=1)
-> Async Foreign Scan on my_sharded_table_fs00002 my_sharded_table_2 (cost=100.00..100.00 rows=1 width=12) (actual time=2.875..2.875 rows=0 loops=1)
Planning Time: 0.090 ms
Execution Time: 90.373 ms
(5 rows)
Sat 23 Nov 2024 12:01:11 AM GMT (every 2s)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Append (cost=100.00..200.01 rows=2 width=12) (actual time=6.028..6.028 rows=0 loops=1)
-> Async Foreign Scan on my_sharded_table_fs00001 my_sharded_table_1 (cost=100.00..100.00 rows=1 width=12) (actual time=2.415..2.415 rows=0 loops=1)
-> Async Foreign Scan on my_sharded_table_fs00002 my_sharded_table_2 (cost=100.00..100.00 rows=1 width=12) (actual time=2.866..2.866 rows=0 loops=1)
Planning Time: 0.089 ms
Execution Time: 8.560 ms
(5 rows)
Sat 23 Nov 2024 12:01:13 AM GMT (every 2s)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Append (cost=100.00..200.01 rows=2 width=12) (actual time=6.593..6.594 rows=0 loops=1)
-> Async Foreign Scan on my_sharded_table_fs00001 my_sharded_table_1 (cost=100.00..100.00 rows=1 width=12) (actual time=2.413..2.413 rows=0 loops=1)
-> Async Foreign Scan on my_sharded_table_fs00002 my_sharded_table_2 (cost=100.00..100.00 rows=1 width=12) (actual time=3.364..3.364 rows=0 loops=1)
Planning Time: 0.089 ms
Execution Time: 11.524 ms
(5 rows)
Sat 23 Nov 2024 12:01:15 AM GMT (every 2s)
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Append (cost=100.00..200.01 rows=2 width=12) (actual time=85.985..85.986 rows=0 loops=1)
-> Async Foreign Scan on my_sharded_table_fs00001 my_sharded_table_1 (cost=100.00..100.00 rows=1 width=12) (actual time=13.688..13.688 rows=0 loops=1)
-> Async Foreign Scan on my_sharded_table_fs00002 my_sharded_table_2 (cost=100.00..100.00 rows=1 width=12) (actual time=71.442..71.442 rows=0 loops=1)
Planning Time: 0.089 ms
Execution Time: 102.063 ms
(5 rows)
Sat 23 Nov 2024 12:01:17 AM GMT (every 2s)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Append (cost=100.00..200.01 rows=2 width=12) (actual time=6.008..6.009 rows=0 loops=1)
-> Async Foreign Scan on my_sharded_table_fs00001 my_sharded_table_1 (cost=100.00..100.00 rows=1 width=12) (actual time=2.409..2.409 rows=0 loops=1)
-> Async Foreign Scan on my_sharded_table_fs00002 my_sharded_table_2 (cost=100.00..100.00 rows=1 width=12) (actual time=2.894..2.894 rows=0 loops=1)
Planning Time: 0.091 ms
Execution Time: 8.376 ms
(5 rows)
There are also some details in the global views:
postgres_limitless=> select subcluster_id, subcluster_type, schemaname, relname, seq_scan, analyze_count
from rds_aurora.limitless_stat_all_tables
where relname like 'my_sharded_table%'
order by relname, subcluster_type, subcluster_id;
subcluster_id | subcluster_type | schemaname | relname | seq_scan | analyze_count
---------------+-----------------+------------+--------------------------+----------+---------------
2 | router | public | my_sharded_table | 0 | 1
3 | router | public | my_sharded_table | 0 | 1
4 | shard | public | my_sharded_table | 0 | 1
5 | shard | public | my_sharded_table | 0 | 1
5 | shard | public | my_sharded_table_ts00001 | 43 | 1
5 | shard | public | my_sharded_table_ts00002 | 43 | 1
5 | shard | public | my_sharded_table_ts00003 | 43 | 1
5 | shard | public | my_sharded_table_ts00004 | 43 | 1
5 | shard | public | my_sharded_table_ts00005 | 43 | 1
5 | shard | public | my_sharded_table_ts00006 | 43 | 1
5 | shard | public | my_sharded_table_ts00007 | 43 | 1
5 | shard | public | my_sharded_table_ts00008 | 43 | 1
...
4 | shard | public | my_sharded_table_ts00507 | 43 | 1
4 | shard | public | my_sharded_table_ts00508 | 43 | 1
4 | shard | public | my_sharded_table_ts00509 | 43 | 1
4 | shard | public | my_sharded_table_ts00510 | 43 | 1
4 | shard | public | my_sharded_table_ts00511 | 43 | 1
4 | shard | public | my_sharded_table_ts00512 | 43 | 1
(516 rows)
The router's catalog contains only the table mapping to the shards. The shards are complete databases with their catalog, and the partitions belonging to the shard are stored there.
In the next posts of this series, we will examine colocation, reference tables, re-sharding, and transaction consistency.
Posted on November 25, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.