π hash+hash partitioning+sharding
Franck Pachot
Posted on June 22, 2022
In YugabyteDB there is partitioning at SQL level, the PostgreSQL declarative partitioning, and there is sharding at table/index/partition level, the automatic hash or range sharding. I explained those in a previous blog post. In short, Sharding is used to automatically distribute data across the cluster. Partitioning is used to group data together. For example, you partition by range on a timestamp to colocate on time windows for information lifecycle management. Or you group on a list of countries to map to specific regions for data residency. Or you group on a list of tenants to isolate them from the others.
PostgreSQL has also hash partitioning, but you probably don't use it in YugabyteDB because its main goal is distribution, and this is better achieved with sharding to tablets: more partitions, automatic rebalance, global indexes,...
But, never say never, what if you think you need hash partitioning plus hash sharding? It is important to be sure that their hashing method work well together to distributed data. Basically, I want that within each partition, the sharding to tablets is well balanced. One way is to look at the algorithm. Another is to test only your specific datatypes.
Here, I'll create a table with n
partitions, fill it with one million rows, and look at the yb_hash_code()
in each, which is the YugabyteDB hash function.
This is the script I've run, doing this from 1 to 50 partitions:
c=1000000
for n in {1..50} ; do
echo "
$n Partitions, $c rows:"
{
cat <<SQL
drop table demo;
create extension if not exists pgcrypto;
create table demo (id uuid default gen_random_uuid(), val int) partition by hash(id);
SQL
for i in $(seq 0 $(( $n -1 ))) ; do
cat <<SQL
create table demo$i partition of demo for values with (modulus $n , remainder $i);
SQL
done
cat <<SQL
insert into demo ( val) select generate_series(1,$c);
SQL
for i in $(seq 0 $(( $n -1 ))) ; do
cat <<SQL
select format('Partition %s / %s : min= %s max= %s -> %s %% rows',to_char($i,'99'),to_char($n,'99'),min(yb_hash_code(id)),max(yb_hash_code(id)),100*count(*)/$c) from demo$i;
SQL
done
} | psql -p 5433
done | grep Partition | tee hash-hash.log
The result:
1 Partitions, 1000000 rows:
Partition 0 / 1 : min= 0 max= 65535 -> 100 % rows
2 Partitions, 1000000 rows:
Partition 0 / 2 : min= 0 max= 65535 -> 49 % rows
Partition 1 / 2 : min= 0 max= 65535 -> 50 % rows
3 Partitions, 1000000 rows:
Partition 0 / 3 : min= 0 max= 65535 -> 33 % rows
Partition 1 / 3 : min= 0 max= 65535 -> 33 % rows
Partition 2 / 3 : min= 0 max= 65535 -> 33 % rows
4 Partitions, 1000000 rows:
Partition 0 / 4 : min= 0 max= 65535 -> 25 % rows
Partition 1 / 4 : min= 0 max= 65535 -> 24 % rows
Partition 2 / 4 : min= 0 max= 65535 -> 24 % rows
Partition 3 / 4 : min= 0 max= 65535 -> 24 % rows
5 Partitions, 1000000 rows:
Partition 0 / 5 : min= 0 max= 65535 -> 19 % rows
Partition 1 / 5 : min= 0 max= 65535 -> 20 % rows
Partition 2 / 5 : min= 0 max= 65535 -> 20 % rows
Partition 3 / 5 : min= 0 max= 65535 -> 19 % rows
Partition 4 / 5 : min= 0 max= 65535 -> 20 % rows
6 Partitions, 1000000 rows: [183/1982]
Partition 0 / 6 : min= 0 max= 65535 -> 16 % rows
Partition 1 / 6 : min= 0 max= 65535 -> 16 % rows
Partition 2 / 6 : min= 0 max= 65535 -> 16 % rows
Partition 3 / 6 : min= 0 max= 65535 -> 16 % rows
Partition 4 / 6 : min= 0 max= 65535 -> 16 % rows
Partition 5 / 6 : min= 0 max= 65534 -> 16 % rows
7 Partitions, 1000000 rows:
Partition 0 / 7 : min= 0 max= 65535 -> 14 % rows
Partition 1 / 7 : min= 0 max= 65535 -> 14 % rows
Partition 2 / 7 : min= 0 max= 65535 -> 14 % rows
Partition 3 / 7 : min= 0 max= 65535 -> 14 % rows
Partition 4 / 7 : min= 0 max= 65535 -> 14 % rows
Partition 5 / 7 : min= 0 max= 65535 -> 14 % rows
Partition 6 / 7 : min= 0 max= 65534 -> 14 % rows
8 Partitions, 1000000 rows:
Partition 0 / 8 : min= 2 max= 65535 -> 12 % rows
Partition 1 / 8 : min= 0 max= 65535 -> 12 % rows
Partition 2 / 8 : min= 0 max= 65535 -> 12 % rows
Partition 3 / 8 : min= 0 max= 65535 -> 12 % rows
Partition 4 / 8 : min= 0 max= 65535 -> 12 % rows
Partition 5 / 8 : min= 0 max= 65535 -> 12 % rows
Partition 6 / 8 : min= 0 max= 65535 -> 12 % rows
Partition 7 / 8 : min= 0 max= 65535 -> 12 % rows
...
50 Partitions, 1000000 rows:
Partition 0 / 50 : min= 2 max= 65534 -> 1 % rows
Partition 1 / 50 : min= 2 max= 65531 -> 2 % rows
Partition 2 / 50 : min= 5 max= 65527 -> 1 % rows
Partition 3 / 50 : min= 3 max= 65534 -> 2 % rows
Partition 4 / 50 : min= 2 max= 65532 -> 2 % rows
Partition 5 / 50 : min= 0 max= 65530 -> 1 % rows
Partition 6 / 50 : min= 6 max= 65534 -> 2 % rows
Partition 7 / 50 : min= 0 max= 65526 -> 2 % rows
Partition 8 / 50 : min= 3 max= 65535 -> 2 % rows
Partition 9 / 50 : min= 3 max= 65535 -> 1 % rows
Partition 10 / 50 : min= 4 max= 65534 -> 1 % rows
Partition 11 / 50 : min= 2 max= 65535 -> 1 % rows
Partition 12 / 50 : min= 2 max= 65533 -> 2 % rows
Partition 13 / 50 : min= 0 max= 65535 -> 1 % rows
Partition 14 / 50 : min= 14 max= 65529 -> 2 % rows
Partition 15 / 50 : min= 0 max= 65530 -> 1 % rows
Partition 16 / 50 : min= 0 max= 65535 -> 2 % rows
Partition 17 / 50 : min= 4 max= 65528 -> 1 % rows
Partition 18 / 50 : min= 1 max= 65533 -> 1 % rows
Partition 19 / 50 : min= 6 max= 65532 -> 1 % rows
Partition 20 / 50 : min= 0 max= 65533 -> 2 % rows
Partition 21 / 50 : min= 0 max= 65535 -> 1 % rows
Partition 22 / 50 : min= 4 max= 65534 -> 2 % rows
Partition 23 / 50 : min= 1 max= 65529 -> 2 % rows
Partition 24 / 50 : min= 2 max= 65530 -> 2 % rows
Partition 25 / 50 : min= 5 max= 65529 -> 2 % rows
Partition 26 / 50 : min= 3 max= 65518 -> 1 % rows
Partition 27 / 50 : min= 4 max= 65534 -> 1 % rows
Partition 28 / 50 : min= 0 max= 65529 -> 1 % rows
Partition 29 / 50 : min= 6 max= 65530 -> 2 % rows
Partition 30 / 50 : min= 7 max= 65522 -> 2 % rows
Partition 31 / 50 : min= 11 max= 65533 -> 2 % rows
Partition 32 / 50 : min= 2 max= 65534 -> 1 % rows
Partition 33 / 50 : min= 1 max= 65526 -> 2 % rows
Partition 34 / 50 : min= 3 max= 65535 -> 2 % rows
Partition 35 / 50 : min= 4 max= 65534 -> 2 % rows
Partition 36 / 50 : min= 2 max= 65534 -> 2 % rows
Partition 37 / 50 : min= 0 max= 65528 -> 1 % rows
Partition 38 / 50 : min= 2 max= 65532 -> 2 % rows
Partition 39 / 50 : min= 0 max= 65534 -> 2 % rows
Partition 40 / 50 : min= 1 max= 65527 -> 2 % rows
Partition 41 / 50 : min= 1 max= 65535 -> 1 % rows
Partition 42 / 50 : min= 0 max= 65535 -> 2 % rows
Partition 43 / 50 : min= 5 max= 65533 -> 2 % rows
Partition 44 / 50 : min= 7 max= 65533 -> 1 % rows
Partition 45 / 50 : min= 10 max= 65534 -> 2 % rows
Partition 46 / 50 : min= 2 max= 65534 -> 2 % rows
Partition 47 / 50 : min= 4 max= 65534 -> 1 % rows
Partition 48 / 50 : min= 1 max= 65526 -> 1 % rows
Partition 49 / 50 : min= 4 max= 65528 -> 2 % rows
In each partition, I have the whole range of hash codes (from 0 to 65535) which are used for sharding (by ranges of hash code), and the distribution of rows to partitions is well balanced.
I take one partition and look at the yb_hash_code() repartition:
yugabyte=#
select n,count(h) from (
select yb_hash_code(id) h,count(*) n
from demo42 group by yb_hash_code(id)
) x group by n order by n;
n | count
---+-------
1 | 14791
2 | 2216
3 | 233
4 | 18
5 | 3
(5 rows)
For 14791 rows within the 20009 of this partition, a yb_hash_code() maps to one row only. And there's only 3 yb_hash_codes() that map to 5 rows. This shows a good distribution within a partition, even when the partition is already a result of hash partitioning.
Each partition looks like this:
Here, this partition is split to 3 tablets, in 3 ranges of yb_hash_code()
: hash_split: [0x0000, 0x5555)
goes from 0 to 21844, hash_split: [0x5555, 0xAAAA)
from 21845 to 43689 and hash_split: [0xAAAA, 0xFFFF]
from 43690 to 65535. With more data, they will be split further.
Note that the hashing algorithm is very different: PostgreSQL uses a modulo on hash value. For this, you must know the number of partition from the beginning. YugabyteDB uses ranges, which makes it easier to split further. Here is a good read about the difference: https://ben.kirw.in/2018/12/02/hash-range-partitioning/
In summary, Hash Partitioning + Hash Sharding works to distribute data. You can test if for your partition key and your datatypes. But, most important, I think you should not need it. Hash Sharding alone should be sufficient. With hash sharding, you can have many tablets. With a 65535 hash code value, and the recommended tablet count and size, you can distribute very large tables. But if you think you do, I'm interested to know about the use case.
Posted on June 22, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.