YugabyteDB is Distributed SQL: resilient and consistent
Franck Pachot
Posted on August 15, 2023
In my previous article, I conducted a pgbench
test on Citus to underscore its limitations in two key areas:
-
PostgreSQL compatible API: In Citus, creating tables demands the invocation of additional functions alongside the standard SQL
CREATE TABLE
Data Definition Language. - PostgreSQL ACID properties: Citus exhibits eventual consistency in its read operations, and the global database attains a consistent state only when the application is offline.
Distributed SQL sets itself apart by offering comprehensive SQL features on a global scale. It operates as a unified logical database, with data storage and transactions dispersed across multiple nodes. Notably, YugabyteDB extends its value by ensuring PostgreSQL compatibility, enabling application DDL and DML to function seamlessly without necessitating any code modifications.
Startup a YugabyteDB cluster
In this context, all nodes possess equal capabilities to accept PostgreSQL connections, handle SQL processing, manage transactions, distribute table rows and index entries, and store their part in their local storage.
I create a network and start the first node, which I also use to expose the UI on http://n1.yb:15433/
docker network create yb
docker run -d --network yb --hostname n1 -p15433:15433 yugabytedb/yugabyte \
yugabyted start --background=false --ui=true
I check the status (yugabyted status
)
Then I add as many nodes as I want, with a minimum of 3 to replicate for fault tolerance:
for i in {2..6}
do
docker run -d --network yb --hostname n$i yugabytedb/yugabyte \
yugabyted start --background=false --join n1.yb
done
It has switched automatically to Replication Factor 3 for resilience to node failure:
The UI shows the summary with 6 nodes:
Create pgbench tables
Given YugabyteDB's PostgreSQL compatibility, I run the pgbench
utility from the PostgreSQL image, connecting to any of the nodes I've launched, as they are all equal, to create the tables:
docker run -it --network yb \
-e PGHOST=n4.yb -e PGUSER=yugabyte -e PGPORT=5433 \
postgres pgbench -iIdtGp yugabyte
I've omitted the VACUUM step (-Iv
) from the initialization phases, as YugabyteDB doesn't necessitate VACUUM operations. You are welcome to retain it, but be aware that it will result in a WARNING: VACUUM will be ignored
. In YugabyteDB, the storage mechanism utilizes the cluster Hybrid Logical Clock for sequencing transactions and operations on table rows, and index entries. While this might appear intricate, it simplifies various aspects: there's no requirement to fine-tune autovacuum, no occurrence of bloat, and no concerns regarding xid wraparound.
There's a similar warning about the FILLFACTOR, which is not needed either:
Indeed, Heap Only Tuple (HOT) updates are unnecessary in this context. The tables are stored within their Primary Key LSM-Tree, which allows for solely writing the new column values without duplicating the entire row. This design reduces bloat and obviates the need for HOT updates.
There is nothing else to do. The tables and indexes are already efficiently distributed to "Tablets" and replicated using the Raft consensus algorithm, ensuring data consistency across "Leaders" and their "Peers."
Run pgbench transactions
I execute transactions over a 10-second interval (-T 10
), similar to the previous article. The sole divergence is the omission of the -n
parameter, skipping the unnecessary VACUUM step:
docker run -it --network yb \
-e PGHOST=n4.yb -e PGUSER=yugabyte -e PGPORT=5433 \
postgres pgbench -nT 10 yugabyte
While this is running, I check the UI, which shows that reads and writes are distributed on all nodes:
All transactions were processed with no errors:
Run OLTP workload: strong consistency
Let's run pgbench
in the background and check how the tables are consistent:
docker run -it --network yb \
-e PGHOST=n4.yb -e PGUSER=yugabyte -e PGPORT=5433 \
postgres pgbench -nT 60 yugabyte &
docker run -i --network yb \
-e PGHOST=n4.yb -e PGUSER=yugabyte -e PGPORT=5433 \
postgres psql yugabyte <<'SQL'
select a.sum, b.sum, t.sum , a.sum = b.sum and b.sum = t.sum
from
( select sum(abalance) from pgbench_accounts ) as a
,( select sum(bbalance) from pgbench_branches ) as b
,( select sum(tbalance) from pgbench_tellers ) as t
;
\watch 1
SQL
As anticipated, the behavior observed reaffirms YugabyteDB's status as a Distributed SQL database. Both reads and writes exhibit consistent behavior, in line with the expectations.
The execution proceeded seamlessly without encountering any failures. Remarkably, the latency and throughput remained comparable to the results obtained during the run on Citus, as detailed in the preceding post.
It's important to underscore that YugabyteDB maintains a high level of performance while offering superior high availability. Unlike the Citus/PostgreSQL/Patroni configuration, YugabyteDB stands out by eliminating downtime associated with unplanned failures and maintenance processes such as OS patching and database upgrades.
Consistency and Resilience
Given the presence of a YugabyteDB laboratory environment, it is feasible to assess the system's resilience. This can be accomplished by isolating one node from the network using the command docker network disconnect -f
and validating that the application remains operational despite the network disruption.
docker ps -q
docker run -it --network yb \
-e PGHOST=n4.yb -e PGUSER=yugabyte -e PGPORT=5433 \
postgres pgbench -nT 60 --max-tries 10 yugabyte &
sleep 15
docker exec -it 5b63fe9e48e5 hostname
docker network disconnect -f yb 5b63fe9e48e5
The observed increase in latency in this short-duration run can be attributed to the TCP Timeout, which is set at 15 seconds. This concerns the first read or write operation for each tablet with its Raft Leader located on the temporarily unreachable node. Importantly, this delay does not result in application errors or timeouts. Subsequently, the reads and writes are directed towards the newly elected Leaders.
You can check the UI to understand what happened. The node was down for more than 3 seconds, and then the tablets have elected new Leaders to take over the ongoing reads and writes:
Should the network be re-established within a 15-minute window (which can be tested using docker network connect
), the tablet peers will synchronize to regain alignment, leading to the election of new Leaders for load rebalancing. However, suppose the network remains disconnected beyond this timeframe. In that case, new peers will initiate bootstrapping on the available nodes to reach three replicas per tablet and no peers in the node that is down for more than 15 minutes:
This illustrates YugabyteDB's capacity for automated recovery and fault tolerance.
To Summarize...
I've written these two posts to provide concrete facts, which can be easily tested using a conventional SQL application like pgbench
within a straightforward Docker-based laboratory setup. Beyond the traditional marketing assertions encompassing Scaling, Sharding, SQL, ACID, and PostgreSQL compatibility, a clear distinction emerges between:
Citus or any database sharding solution: In this context, each shard constitutes an individual PostgreSQL database. However, it's crucial to recognize that the cluster as a whole behaves differently from a standard PostgreSQL environment. It exclusively offers eventually consistent without certain critical SQL and ACID attributes.
YugabyteDB Distributed SQL: In contrast, YugabyteDB not only distributes connections, transactions, and storage but also presents itself as a unified logical database that spans multiple zones. It encompasses the entire spectrum of SQL features, with robust ACID properties and a commitment to strong consistency and can be used in place of PostgreSQL without changing the application code.
Additionally, we've gained insights into the architectural simplicity of YugabyteDB. All nodes are active, whether pods or virtual machines, and can seamlessly accept connections, DDL, DML, reads, and writes. This is achieved without the need for coordinator nodes or deliberation over synchronous or asynchronous standbys, which would not qualify for being Cloud Native. It is elastic: you can add more nodes online to scale the number of connections, the SQL executions, the reads and writes, and the volume.
Posted on August 15, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
December 23, 2023
November 18, 2023