Franck Pachot
Posted on November 26, 2024
It is important to clarify that Aurora Limitless is not a single distributed SQL database. Instead, it functions as a sharding layer that operates over multiple PostgreSQL databases. While all SQL features are available within a single shard, not all features can be accessed through the sharding layer.
No PostgreSQL partitioning
Aurora Limitless uses PostgreSQL partitioning to implement the sharding layer, which means it's not possible to partition shared tables.
In a PostgreSQL application, you can partition a table by range of dates, simplifying lifecycle management. This approach allows for easier archiving or purging of old partitions instead of performing lengthy delete operations. However, this functionality is not available in Aurora Limitless.
postgres_limitless=> set rds_aurora.limitless_create_table_mode=sharded;
SET
postgres_limitless=> set rds_aurora.limitless_create_table_shard_key='{"id"}';
SET
postgres_limitless=> create table orders (
id uuid default gen_random_uuid()
, year int
, order_id int
) partition by range(year);
ERROR: PARTITION BY is not supported
Aurora Limitless utilizes PostgreSQL partitioning to present the shards to the routers and to slice the tables within each shard, making additional user-defined partitioning impossible.
Limited Unique Constraints and Primary Keys
In SQL databases, many tables use both a natural key and a surrogate key. The natural key, composed of values provided by the user, is often utilized for quick data access. However, it may not be the best choice for a primary key since it can be unknown at the time of insert and may change over time. For example, in an "orders" table, the natural key might combine the fiscal "year" with an "order_id." This "order_id" could be assigned after the initial insert to ensure a continuous numbering sequence without gaps.
In contrast, the surrogate key serves as the primary key, providing a unique identifier for the row as soon as it is added and remaining unchanged for use in foreign keys.
If the surrogate key "id" is used as a sharding key, it is not possible to declare a unique constraint on the natural key:
postgres_limitless=> set rds_aurora.limitless_create_table_mode=sharded;
SET
postgres_limitless=> set rds_aurora.limitless_create_table_shard_key='{"id"}';
SET
postgres_limitless=> create table orders (
id uuid default gen_random_uuid()
, year int
, order_id int
);
CREATE TABLE
postgres_limitless=> alter table orders add constraint orders_surrogate_key primary key (id);
ALTER TABLE
postgres_limitless=> alter table orders add constraint orders_natural_key unique (year, order_id);
ERROR: UNIQUE constraint on sharded table must include the shard key
DETAIL: UNIQUE constraint on table "orders" lacks column "id" which is part of the shard key.
This limitation arises from using PostgreSQL partitioning to shard the table. Enforcing a unique constraint requires a unique index, but PostgreSQL does not support global indexes. Instead, it can only create one index per partition, which means it cannot enforce a unique constraint at a global level. Aurora Limitless has the same limitation: no global indexes or unique constraints across multiple shards.
To declare the natural key as a unique constraint, I can make "year" the sharding key:
postgres_limitless=> set rds_aurora.limitless_create_table_mode=sharded;
SET
postgres_limitless=> set rds_aurora.limitless_create_table_shard_key='{"year"}';
SET
postgres_limitless=> create table orders (
id uuid default gen_random_uuid()
, year int
, order_id int
);
CREATE TABLE
postgres_limitless=> alter table orders add constraint orders_natural_key unique (year, order_id);
ALTER TABLE
postgres_limitless=> alter table orders add constraint orders_surrogate_key primary key (id);
ERROR: PRIMARY KEY constraint on sharded table must include the shard key
DETAIL: PRIMARY KEY constraint on table "orders" lacks column "year" which is part of the shard key.
I was able to declare the combination of "(year, order_id)" as unique, but I cannot declare "id" as unique. This situation might be acceptable since we generate it as a UUID, which is intended to be unique. However, this also means that we do not have quick access to the surrogate key and cannot use it for foreign key references.
I can only create a non-unique index on the surrogate key for faster access. Unfortunately, this type of index cannot enforce uniqueness and cannot be utilized by foreign keys.
postgres_limitless=> create index on orders(id);
CREATE INDEX
This is a significant limitation of database sharding in general. This design works well when the data model consistently utilizes a sharding key, such as in multi-tenant applications where the tenant ID is present in all table keys.
Limited Referential Integrity Foreign Keys
Another limitation to be aware of is that while foreign keys are available within each shard (since they are PostgreSQL databases) they cannot be established at a global level. This means a foreign key cannot reference a key that might be stored in another shard.
The following scenario works because both the child table and the parent table share the same sharding key:
postgres_limitless=> set rds_aurora.limitless_create_table_mode=sharded;
SET
postgres_limitless=> set rds_aurora.limitless_create_table_shard_key='{"year"}';
SET
postgres_limitless=> create table orders_items (
primary key ( year, order_id, item_id )
, year int
, order_id int
, item_id int
);
CREATE TABLE
This approach is effective and ideal for achieving optimal single-shard joins:
postgres_limitless=> explain(generic_plan) select *
from orders_items
join orders using(year, order_id)
where id=$1
;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Append (cost=100.00..328.98 rows=2 width=28)
-> Async Foreign Scan (cost=100.00..164.48 rows=1 width=28)
Remote Plans from Shard 5:
Append (cost=4.37..4098.21 rows=256 width=28)
-> Nested Loop (cost=4.37..16.00 rows=1 width=28)
-> Bitmap Heap Scan on orders_ts00001 r6_1 (cost=4.21..14.37 rows=8 width=24)
Recheck Cond: (id = $1::uuid)
-> Bitmap Index Scan on orders_ts00001_id_idx (cost=0.00..4.21 rows=8 width=0)
Index Cond: (id = $1::uuid)
-> Index Only Scan using orders_items_ts00001_pkey on orders_items_ts00001 r4_1 (cost=0.15..0.19 rows=1 width=12)
Index Cond: ((year = r6_1.year) AND (order_id = r6_1.order_id))
-> Nested Loop (cost=4.37..16.00 rows=1 width=28)
-> Bitmap Heap Scan on orders_ts00002 r6_2 (cost=4.21..14.37 rows=8 width=24)
Recheck Cond: (id = $1::uuid)
-> Bitmap Index Scan on orders_ts00002_id_idx (cost=0.00..4.21 rows=8 width=0)
Index Cond: (id = $1::uuid)
-> Index Only Scan using orders_items_ts00002_pkey on orders_items_ts00002 r4_2 (cost=0.15..0.19 rows=1 width=12)
Index Cond: ((year = r6_2.year) AND (order_id = r6_2.order_id))
...
-> Nested Loop (cost=4.37..16.00 rows=1 width=28)
-> Bitmap Heap Scan on orders_ts00256 r6_256 (cost=4.21..14.37 rows=8 width=24)
Recheck Cond: (id = $1::uuid)
-> Bitmap Index Scan on orders_ts00256_id_idx (cost=0.00..4.21 rows=8 width=0)
Index Cond: (id = $1::uuid)
-> Index Only Scan using orders_items_ts00256_pkey on orders_items_ts00256 r4_256 (cost=0.15..0.19 rows=1 width=12)
Index Cond: ((year = r6_256.year) AND (order_id = r6_256.order_id))
Relations: (orders_items_fs00001 orders_items_1) INNER JOIN (orders_fs00001 orders_1)
-> Async Foreign Scan (cost=100.00..164.48 rows=1 width=28)
Remote Plans from Shard 4:
Append (cost=4.37..4098.21 rows=256 width=28)
-> Nested Loop (cost=4.37..16.00 rows=1 width=28)
-> Bitmap Heap Scan on orders_ts00257 r7_1 (cost=4.21..14.37 rows=8 width=24)
Recheck Cond: (id = $1::uuid)
-> Bitmap Index Scan on orders_ts00257_id_idx (cost=0.00..4.21 rows=8 width=0)
Index Cond: (id = $1::uuid)
-> Index Only Scan using orders_items_ts00257_pkey on orders_items_ts00257 r5_1 (cost=0.15..0.19 rows=1 width=12)
Index Cond: ((year = r7_1.year) AND (order_id = r7_1.order_id))
...
-> Nested Loop (cost=4.37..16.00 rows=1 width=28)
-> Bitmap Heap Scan on orders_ts00512 r7_256 (cost=4.21..14.37 rows=8 width=24)
Recheck Cond: (id = $1::uuid)
-> Bitmap Index Scan on orders_ts00512_id_idx (cost=0.00..4.21 rows=8 width=0)
Index Cond: (id = $1::uuid)
-> Index Only Scan using orders_items_ts00512_pkey on orders_items_ts00512 r5_256 (cost=0.15..0.19 rows=1 width=12)
Index Cond: ((year = r7_256.year) AND (order_id = r7_256.order_id))
Relations: (orders_items_fs00002 orders_items_2) INNER JOIN (orders_fs00002 orders_2)
(3593 rows)
This approach is particularly beneficial for weak entities (also known as compositions in UML) that are often queried together, those that are stored as a single hierarchical document in NoSQL databases. When those documents are stored in multiple tables, Aurora Limitless facilitates the colocation of data for single-shard joins, which improves performance. However, it is essential to understand that not all aspects of SQL referential integrity can be maintained with database sharding. For example, an order may reference another order, like the previous order it replaces, which might not necessarily fall within the same year:
postgres_limitless=> alter table orders
add replaced_order_year int,
add replaced_order_id int
;
ALTER TABLE
postgres_limitless=> alter table orders
add constraint orders_to_replaced
foreign key (replaced_order_year, replaced_order_id)
references orders (year, order_id)
;
ERROR: FOREIGN KEY constraint on sharded table must include the shard key
DETAIL: FOREIGN KEY constraint on table "orders" lacks column "year" which is part of the shard key.
A foreign key constraint is impossible in this case because the referenced row may belong to a different shard.
This highlights the advantages and disadvantages of database sharding. Sharding is effective when a single sharding key is utilized, as it allows related rows to be stored together. However, it cannot be implemented in existing PostgreSQL applications that do not have a defined sharding key. To successfully use sharding, the application must be specifically designed for it. Once this is done, Aurora Limitless can significantly simplify operations compared to manual sharding.
In the upcoming posts, we will explore when foreign keys can be used. Specifically, we will discuss situations where the reference table is ensured to reside on the same shard by co-locating it with the same sharding key when applicable, or by broadcasting reference tables across all shards.
Posted on November 26, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.