Matteo Crosta
Posted on July 28, 2023
Table partitioning is a highly effective technique used to improve the performance of very large database tables. By dividing the table's content into smaller sub-tables, known as partitions, the overall size of the table is reduced, leading to significant performance enhancements.
When should you partition a table?
If your table is too big for your server's RAM.
You should consider partitioning it: when a table reaches a few gigabytes in size, it's time to split it up.If you're working with massive amounts of data
Don't bother partitioning until you've got millions of records. Otherwise, you won't see much of a performance boost.If your table can be logically broken down into smaller chunks, Example: you have a table full of server logs. You could split them up by date, so all the logs from the same day are in one single partition. This makes it much easier to do tasks like deleting old logs by just dropping the partition.
Available partitioning types
Postgres has built-in support for three types of partitioning covering the most common use cases.
Partition by Range
The table is partitioned into “ranges” defined by a key column or a set of columns, with no overlap between the ranges of values assigned to different partitions
In the following example, the people table will be partitioned by birth_date
CREATE TABLE people (
id int not null,
birth_date date not null,
country_code character(2) not null,
name text
) PARTITION BY RANGE (birth_date);
CREATE TABLE people_y2000 PARTITION OF people
FOR VALUES FROM ('2000-01-01') TO ('2001-01-01');
CREATE TABLE people_y2001 PARTITION OF people
FOR VALUES FROM ('2001-01-01') TO ('2002-01-01');
CREATE TABLE people_y2002 PARTITION OF people
FOR VALUES FROM ('2002-01-01') TO ('2003-01-01');
Let’s try it:
INSERT INTO people (id, birth_date, country_code, name) VALUES
(1, '2000-01-01', 'US', 'John'),
(2, '2000-02-02', 'IT', 'Jane'),
(3, '2001-03-03', 'FR', 'Bob');
> INSERT 0 3
SELECT schemaname,relname,n_live_tup
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC;
schemaname | relname | n_live_tup
------------+--------------+------------
public | people_y2000 | 2
public | people_y2001 | 1
public | people_y2002 | 0
As you can see we inserted three records into the master table, people. Since the table is partitioned by birth_date, two records have been added to partition people_y2000, one into people_y2001 while people_y2002 is still empty.
Partition by List
The table is partitioned by explicitly listing which key values appear in each partition.
Taking the same example, let’s add a country_code column and use it as the partitioning key
CREATE TABLE people (
id int not null,
birth_date date not null,
country_code character(2) not null,
name text
) PARTITION BY LIST (country_code);
-- Partition for people living in Europe
CREATE TABLE people_EU PARTITION OF people
FOR VALUES IN ('AT', 'DE', 'IT', 'FR', 'ES', ..... );
-- Partition for people living in United States
CREATE TABLE people_US PARTITION OF people
FOR VALUES IN ('US');
Let’s try it:
INSERT INTO people (id, birth_date, country_code, name) VALUES
(1, '2000-01-01', 'US', 'John'),
(2, '2000-02-02', 'IT', 'Jane'),
(3, '2001-03-03', 'FR', 'Bob');
> INSERT 0 3
SELECT schemaname,relname,n_live_tup
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC;
schemaname | relname | n_live_tup
------------+-----------+------------
public | people_eu | 2
public | people_us | 1
Again, PostgreSQL moved every row to the correct partition.
Partition by Hash
The table is partitioned by specifying a modulus and a remainder for each partition. Each partition will hold the rows for which the hash value of the partition key divided by the specified modulus will produce the specified remainder.
This type is useful when we can’t logically divide our data, but we can only reduce the table size by spreading rows into many smaller partitions.
The following SQL will divide people into three tables, every table will contain (almost) the same number of rows.
CREATE TABLE people (
id int not null,
birth_date date not null,
country_code character(2) not null,
name text
) PARTITION BY HASH (id);
CREATE TABLE people_1 PARTITION OF people
FOR VALUES WITH (MODULUS 3, REMAINDER 0);
CREATE TABLE people_2 PARTITION OF people
FOR VALUES WITH (MODULUS 3, REMAINDER 1);
CREATE TABLE people_3 PARTITION OF people
FOR VALUES WITH (MODULUS 3, REMAINDER 2);
Let’s try it:
INSERT INTO people (id, birth_date, country_code, name) VALUES
(1, '2000-01-01', 'US', 'John'),
(2, '2000-02-02', 'IT', 'Jane'),
(3, '2001-03-03', 'FR', 'Bob');
> INSERT 0 3
SELECT schemaname,relname,n_live_tup
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC;
schemaname | relname | n_live_tup
------------+----------+------------
public | people_1 | 1
public | people_2 | 1
public | people_3 | 1
As you can see, the three records have been evenly split across all the partitions available.
Default partition
What happens when you try to insert a record that can’t fit into any partition?
Let’s go back to the people table defined in the list partitioning chapter and try to add Linda, from Canada:
INSERT INTO people (id, birth_date, country_code, name) VALUES
(4, '2002-04-04', 'CA', 'Linda');
ERROR: no partition of relation "people" found for rowDETAILS: Partition key of the failing row contains (country_code) = (CA).
The INSERT will fail because PostgreSQL doesn’t know where to add that record.
The most obvious solution would be to add a new partition, but if we have to do it for every country in the world, we would end up with hundreds of tables with a very small number of records. Not really nice.
Luckily, it’s possible to define a DEFAULT partition!
CREATE TABLE people_default PARTITION OF people DEFAULT;
Trying the same inserts again, it will result in:
INSERT INTO people (id, birth_date, country_code, name) VALUES
(1, '2000-01-01', 'US', 'John'),
(2, '2000-02-02', 'IT', 'Jane'),
(3, '2001-03-03', 'FR', 'Bob'),
(4, '2002-04-04', 'CA', 'Linda');
> INSERT 0 4
schemaname | relname | n_live_tup
------------+----------------+------------
public | people_eu | 2
public | people_us | 1
public | people_default | 1
As you can see, Linda has now been added to people_default.
Sub Partitioning
A single partition can also be a partitioned table!
Back to the LIST example, we can imagine that people_EU will contain a lot of records, so we may want to subdivide it by hash:
CREATE TABLE people (
id int not null,
birth_date date not null,
country_code character(2) not null,
name text
) PARTITION BY LIST (country_code);
CREATE TABLE people_US PARTITION OF people
FOR VALUES IN ('US');
CREATE TABLE people_EU PARTITION OF people
FOR VALUES IN ('AT', 'DE', 'IT', 'FR', 'ES', ..... )
PARTITION BY HASH (id);
CREATE TABLE people_EU_1 PARTITION OF people_EU
FOR VALUES WITH (MODULUS 3, REMAINDER 0);
CREATE TABLE people_EU_2 PARTITION OF people_EU
FOR VALUES WITH (MODULUS 3, REMAINDER 1);
CREATE TABLE people_EU_3 PARTITION OF people_EU
FOR VALUES WITH (MODULUS 3, REMAINDER 2);
This will result in:
INSERT INTO people (id, birth_date, country_code, name) VALUES
(1, '2000-01-01', 'US', 'John'),
(2, '2000-02-02', 'IT', 'Jane'),
(3, '2001-03-03', 'FR', 'Bob');
> INSERT 0 3
SELECT schemaname,relname,n_live_tup
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC;
schemaname | relname | n_live_tup
------------+-------------+------------
public | people_eu_2 | 1
public | people_eu_1 | 1
public | people_us | 1
public | people_eu_3 | 0
Partition operations
Attaching and detaching partitions
As we have seen, single partitions can be created and dropped whenever we want, but what if we want to exclude some records from the master table without deleting them?
The answer is: DETACH
ALTER TABLE people DETACH PARTITION people_us;
A detached partition will act as a normal table, so it will be possible to insert records that would violate the partition constraints.
The reverse operation, ATTACH, is as easy as:
ALTER TABLE people ATTACH PARTITION people_us FOR VALUES IN ('US');
Indexing
Too bad, PostgreSQL doesn’t allow to create a single index covering every partition of the table, but you have to create an index for every partition.
The bad news about this is that the primary key, or any other unique index, must include the columns used on the partition by
statement.
-- THIS WON'T WORK
CREATE UNIQUE INDEX idx_uniq ON people (id);
> ERROR: insufficient columns in UNIQUE constraint definition
> DETAILS: UNIQUE constraint on table "people" lacks column "country_code" which is part of the partition key.
-- THIS WORKS!
CREATE UNIQUE INDEX idx_uniq ON people (id, country_code);
> CREATE INDEX
The reason behind this is the fact that every partition is treated as an independent table, so adding the partition key to the index is the only way to grant the uniqueness of a record across the whole table.
Note that creating an index on the master table will automatically replicate it to every attached partition:
CREATE UNIQUE INDEX idx_uniq ON people (id, country_code);
-- Check created indexes
SELECT tablename, indexname FROM pg_indexes
WHERE schemaname = 'public' ORDER BY tablename, indexname;
tablename | indexname
---------------+-------------------------------------
people | idx_uniq
people_default | people_default_id_country_code_idx
people_eu | people_eu_id_country_code_idx
And that's all you need to know!
Follow me if you liked this article or leave a comment.
Posted on July 28, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 30, 2024
November 29, 2024
November 29, 2024