PostgreSQL and YugabyteDB connections and SQL latency
Frits Hoogland
Posted on October 24, 2022
Creating a connection and execute SQL is required in PostgreSQL and YugabyteDB to perform work, and any request for work will have a certain amount of latency, no matter how small the unit of work is. For optimal performance, it is important to choose a strategy that fits the performance needed.
Creating a connection
Creating a database connection in any database, even embedded ones will add latency. With embedded databases such as sqlite, the overhead seems (and is) very low, but still keeping a database session will have better performance than forking a process to process it.
This article explains the latencies and workarounds for creating connections for PostgreSQL and YugabyteDB. The main message of that article is: you should use an application connection pool to keep and reuse database connections, or you suffer significant higher latencies, even if you have an 'external' connection pooler such as pgbouncer.
A connection pooler can improve latencies, but it doesn't give you persistent connection application connection pool latencies, because an already established connection does not need to create a connection again for performing SQL.
First database access
After startup of a PostgreSQL and YugabyteDB postmaster, the very first connection to each database will cause that session to recreate the files that stores the cache of relations for the global database as well as the database it connected to.
This article explains this in much more detail. The main message is: this is very likely not a problem in a normally used database, but in cases where a PostgreSQL or YugabyteDB database cluster is built for testing, or where a single database is created for testing, you have to be aware that the first connection will show higher latencies, and thus should be kept out of latency calculations.
The catalog again
At this point a PostgreSQL or YugabyteDB connection is established, and ready to perform its work, so the user SQL can be executed. Is this the point that SQL can be executed with stable latencies?
Sadly, no. The PostgreSQL, and therefore the YugabyteDB, architecture requires the phases of Parse, Rewrite, Plan and Execute for performing SQL. The Rewrite and Plan phases require metadata from the database to validate and optimise the SQL to be performed. That metadata is what the databases store in their catalogs.
Every backend caches catalog data independently in its private memory. After backend startup, the private catalog metadata is just enough to have the backend perform SQL. That means it excludes any catalog data about tables and other objects in the database. As a result, when SQL performs access to regular database objects, it must first get the metadata of all objects involved in the SQL from the catalog to be able to perform the phases for execution.
Once the metadata is loaded, it will remain cached in the backend. Performing SQL involving objects for which the catalog data is cached can reuse the cached metadata. DDL can invalidate the catalog cache.
What does that mean? To see the effect of above description of For this, I am using a small utility I created for these kind of tests: connection-profiler (this is rust source code). It quite simply times connecting and SQL execution.
connection-profiler \
--url "host=192.168.66.100 port=5432 user=postgres password=postgres" \
--query 'select count(*) from t' \
--repeat-sql 5
create_connection 3052 us
------------------------------------------------------------
run nr: 1
total simple query 851 us
run nr: 2
total simple query 360 us
run nr: 3
total simple query 356 us
run nr: 4
total simple query 319 us
run nr: 5
total simple query 307 us
============================================================
The connection is the step that takes the most time. This shows the importance of a connection pool.
Then, in the same session an identical SQL is executed: 'select count(*) from t', but the SQL is not that important. It's the timing difference between the SQL runs. The first execution, run number 1, takes significantly more time than the other executions.
However, just these numbers is too little information be absolutely sure about what is going on. These latencies can be cause by all sorts of things. The above execution takes place using the simple query protocol, which lets the database perform the steps of parse, rewrite, plan and execute all in one go, and thus times it as a single step.
Let's look at this using executing it using the extended query protocol:
connection-profiler \
--url "host=192.168.66.100 port=5432 user=postgres password=postgres" \
--query 'select count(*) from t' \
--repeat-sql 5 \
--protocol extended
create_connection 2636 us
------------------------------------------------------------
run nr: 1
parse+describe+sync 577 us
bind+execute+sync 581 us
total extended protocol 1204 us
run nr: 2
parse+describe+sync 247 us
bind+execute+sync 378 us
total extended protocol 668 us
run nr: 3
parse+describe+sync 284 us
bind+execute+sync 312 us
total extended protocol 650 us
run nr: 4
parse+describe+sync 204 us
bind+execute+sync 270 us
total extended protocol 507 us
run nr: 5
parse+describe+sync 242 us
bind+execute+sync 260 us
total extended protocol 537 us
============================================================
A rust connection performs a network roundtrip for database side Parse and Rewrite (which is called 'parse' on the client side, identical to one of the two phases it performs on the server side), and a network roundtrip for Plan and Execute (which are called bind and execute on the client).
What is visible is that both the parse and bind+execute step takes significantly more time for the first execution. This is not hard proof this is because the loading catalog entries, but it is likely. Especially since repeating the above 5 executions a few times gives reasonable equal latencies.
If you closely studied the figures of the two different runs above, you will have found that the extended protocol figures are way worse (having higher latencies) than the simple query protocol. This is because I am using minimalistic SQL, which makes the network time of flight leading in the latencies. If more data is pushed to or pulled from the database, the database time in the execution phase will increase reducing the relative time in networking.
Conclusion
If you want to use database connections with PostgreSQL, and therefore YugabyteDB, efficiently and do get consistent SQL response times, having a dedicated application side connection pool is required.
If a dedicated application side connection pool is used, the response time of SQL can still fluctuate if the backend does not have the database objects metadata cached, and therefore must load these in order to allow the backend to perform the steps to execute the SQL. Backends cannot share their loaded/cached catalog data between each other.
If the required catalog data is not present in the backend, it must be fetched to allow the SQL to be processed, which increases latency for the overall execution. The increase in latency is shown in the above examples. The latency impact of loading catalog data is relative to the time processing actual data in the execution phase.
This means that if a connection pool is used, but does not keep a database client connection to a single database backend, performing SQL might require to request catalog data in a lot of cases, increasing latency.
If you want to see or measure this for your own environment, take the connection-profiler for a spin and see how this works for your environment.
Posted on October 24, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.