Install extensions from PGDG repo to YugabyteDB - example with sequential_uuids

franckpachot

Franck Pachot

Posted on December 22, 2022

Install extensions from PGDG repo to YugabyteDB - example with sequential_uuids

In the previous post, I installed a PostgreSQL extension to YugabyteDB by building it. Some extensions are available in the PostgreSQL Global Development Group YUM repository. Here is how to get it from there into a YugabyteDB.

I'm using YugabyteDB 2.17, which is compatible with PostgreSQL 11.2, on Centos7. Here are the available .rpm:
https://download.postgresql.org/pub/repos/yum/11/redhat/rhel-7-x86_64/

For Centos8, pleas look at the next post:

Example with sequential_uuids

Vlad Mihalcea asked if we have a Time-Sorted Unique Identifier in YugabyteDB. I answered that we probably don't need it but that there is probably a PostgreSQL extension that works with YugabyteDB:
https://twitter.com/FranckPachot/status/1600743186326245376

And, yes, there is, available from the YUM repository. I'll take this as an example.

I do all that in a temporary directory to avoid overwriting existing files

Download the RPM

I can download the .rpm from the online repo:

yum install -y wget
wget -c https://download.postgresql.org/pub/repos/yum/11/redhat/rhel-7-x86_64/sequential_uuids_11-1.0.2-1.rhel7.x86_64.rpm

Enter fullscreen mode Exit fullscreen mode

or though YUM after installing the repo:

yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
yum install --downloadonly -y --downloaddir=$PWD sequential_uuids_11

Enter fullscreen mode Exit fullscreen mode

Dependencies

If downloaded directly, I can check the dependencies with rpm -qRp:
Image description

If downloaded with yum, the dependencies have been downloaded. I can also I can get them with yum -q deplist and they were displayed when downloading:

Image description

Note that I don't need the postgresql11 ones because I'll use YugabyteDB instead.

Extract the extension to YugabyteDB

My YugabyteDB installation dir is /home/yugabyte and the extension files (share/extensions/*.{control,sql} and libs/*.so) must go into /home/yugabyte/postgres.

Because the RPM installs it in /usr/pgsql-11, I create a symbolic link in my temporary directory and them with a relative path:

mkdir -p ./usr &&
ln -Ts /home/yugabyte/postgres ./usr/pgsql-11 &&
rpm2cpio ./sequential_uuids_11-*.rpm | 
 cpio -idmv --no-absolute-filenames

Enter fullscreen mode Exit fullscreen mode

Using -v, it displays the extracted files:

Image description

Thanks to the symbolic link, they are in the YugabyteDB postgres directory.

The temporary directory can be removed.

Testing the extension

You must install the extension on all nodes of your YugabyteDB server.

Then, when connected to any node, load the extension and play with it:

yugabyte=# create extension sequential_uuids;
CREATE EXTENSION

yugabyte=# create table demo ( id uuid, primary key(id asc) );

yugabyte=# insert into demo
             select uuid_time_nextval(5,x'ffff'::int)
             from generate_series(1,5)
            returning *;

yugabyte=# \watch
Thu 22 Dec 2022 08:24:15 AM UTC (every 2s)

                  id
--------------------------------------
 9da30789-478e-4abd-857e-92c60dfbd375
 9da3dd77-0f14-46ae-90ce-5ffb40531bae
 9da34a5d-f400-4496-a0f4-7db59c8790cb
 9da3f9c6-acb9-4f10-a9e8-498767290ca6
 9da35df5-1698-4371-88cb-077d10c678db
(5 rows)

Thu 22 Dec 2022 08:24:17 AM UTC (every 2s)

                  id
--------------------------------------
 9da3b4a3-448c-4f5f-b70c-d420175fc335
 9da39dac-494a-458a-a797-61d6c30a53ca
 9da30ae4-ecbe-4ab6-a795-3c56b1fe46ba
 9da3e66a-510e-45bd-a8b7-f1afaa84865c
 9da32dca-4a4d-48fb-ba20-b4c2f367fed2
(5 rows)

Thu 22 Dec 2022 08:24:19 AM UTC (every 2s)

                  id
--------------------------------------
 9da31ec6-78aa-493e-b0a9-382a5d92a54c
 9da3b406-1968-4b43-8fd7-a575e051d4f7
 9da34ea9-b6cd-4d17-8c88-f080ecf89cdd
 9da3f783-8cff-42bc-9ee9-98d264780718
 9da35c39-0c86-42fa-a33f-e66e47a11068
(5 rows)

Thu 22 Dec 2022 08:24:21 AM UTC (every 2s)

                  id
--------------------------------------
 9da4f46a-e427-45f2-a313-42f0fba948a2
 9da48fdd-fa08-42ac-9e5f-3413a4f5317d
 9da494bc-ec6b-493b-ac8f-d2f83a8f7be3
 9da4561f-568e-4713-b87e-2af989377c09
 9da477ac-02f8-46d9-b84b-3a8f54b31d98
(5 rows)

Thu 22 Dec 2022 08:24:23 AM UTC (every 2s)
Enter fullscreen mode Exit fullscreen mode

You see that the first 4 bytes (because 0xffff, the default, and the maximum) are the same within a 5 second interval (the first parameter which I've set to 5 instead of the default 60 seconds) with the others random.

Note that I have defined the primary key sharding as ASC rather than the default HASH because, if the goal is to colocate the rows inserted at the same time, you don't want to apply hash sharding.

I also verify that from concurrent sessions the prefix is the same:

for i in {1..10}
do
 ysqlsh -tAq -h $(hostname) -c 'insert into demo select uuid_time_nextval() returning *,now()'&
done
Enter fullscreen mode Exit fullscreen mode

Image description

Validating the compatibility with PostgreSQL

The best way to compare the behavior in YugabyteDB is to run the PostgreSQL regression test and compare with the expected output:


yum install -y git
git clone https://github.com/tvondra/sequential-uuids.git


ysqlsh -h $(hostname) --echo-all --quiet         \
 --file sequential-uuids/test/sql/uuids.sql      |
  sdiff sequential-uuids/test/expected/uuids.out -

Enter fullscreen mode Exit fullscreen mode

The only difference is because the regression test has no ORDER BY and the order of rows from a hash partitioned table (the default) in a distributed database is different than from the single-node heap table:
Image description
But the result is the same.

The full PostgreSQL-compatibility of the extension is confirmed by the regression tests.

About Sequential UUID in YugabyteDB

⚠️ You must think about the consequence in a Distributed SQL database before using a time-based UUID. Thanks to the LSM-Tree storage, YugabyteDB doesn't have the problems that sequential_uuids tries to solve (WAL write amplification, B-Tree fragmentation and clustering factor). If you want a UUID, then the one from pgcrypto (already installed in YugabyteDB) gen_random_uuid() is probably the right one.

In addition to that, having concurrent sessions touching the same key range will create a hotspot on one tablet. However, if there are not too many concurrent sessions, and the load is distributed with other tables, then maybe this UUID can benefit from colocation in DocDB and filesystem caches. But a cached sequence may be better as it keeps rows together from the same session, but distributes those from concurrent sessions.

Anyway, I've run the above for a while, with more rows inserted (from generate_series(1,100000)) and checked that tablet auto-split occurs:
Image description
As you see, one is still at Split Depth 1 because no Sequential UUID went into this range, and another has been split 7 times. So YugabyteDB still balanced the storage.

Summary on the extension installation

If you think a PostgreSQL extension is necessary for your application, you can use this example to test it. The same can be used to deploy it. But to be sure that it is supported, do not hesitate to open a git issue of ask in the slack channel.

💖 💪 🙅 🚩
franckpachot
Franck Pachot

Posted on December 22, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related