Another funny benchmark: "hugedbbench"

franckpachot

Franck Pachot

Posted on July 27, 2023

Another funny benchmark: "hugedbbench"

I got a question about this "benchmark" results which shows the following funny results:


Image description

Image description


If this was true, it would be a great win for all Distributed SQL Databases tested there: response time in microseconds 😂. Well, I guess this one is a typo.

For the fun of it, I'll run their benchmark with YugabyteDB.

Single-node cluster

I start a single-node YugabyteDB cluster:

docker network create yb
docker run -d --name yb-n1 --hostname n1 --network yb -p15433:15433 yugabytedb/yugabyte yugabyted start --advertise_address=n1.yb --background=false

Enter fullscreen mode Exit fullscreen mode

I exposed the port 15433 which is the YugabyteDB console:
Image description

I run this "hugedbbench" (from the last commit that works, because it seems broken now) and I display the queries response time from pg_stat_statements:

docker exec -it yb-n1 bash -c '
# get the project
dnf install -y git golang
git clone https://github.com/kokizzu/hugedbbench.git
cd hugedbbench/2021/yugabytedb
git checkout 0fdc96905d319751a79cc386f78f79e64ab22d43

# reset pg_stat_statement
ysqlsh -h n1.yb -c "
drop table if exists bar1
" -c "
select pg_stat_statements_reset()
"

# run the "benchmark"
sed -e "s/127.0.0.1/n1.yb/" -i main_test.go
go test

# show execution statistics
ysqlsh -h n1.yb -c "
select mean_time, min_time, max_time, query, calls, rows
from pg_stat_statements order by 1
" -c "
select pg_size_pretty(pg_table_size(tablename::regclass)) from pg_tables where tableowner=user
"

'
Enter fullscreen mode Exit fullscreen mode

It runs for a few minutes, displaying the total time:

Image description

Here is the result from pg_stat_statements:

    mean_time     |  min_time  |  max_time  |                                     query                                      | calls  |  rows
------------------+------------+------------+--------------------------------------------------------------------------------+--------+--------
         0.231428 |   0.231428 |   0.231428 | select pg_stat_statements_reset()                                              |      1 |      1
 1.82675776968002 |   0.279921 |  71.312284 | SELECT foo FROM bar1 WHERE id=$1                                               | 100000 | 100000
 4.66872027791997 |   0.690059 | 216.301717 | INSERT INTO bar1(id,foo) VALUES($1,$2)                                         | 100000 | 100000
 5.46740847204005 |   1.285634 | 424.903964 | UPDATE bar1 SET foo=$1 WHERE id=$2                                             | 100000 | 100000
      263.1718355 | 122.983744 | 403.359927 | SELECT COUNT($1) FROM bar1                                                     |      2 |      2
       830.000174 | 830.000174 | 830.000174 | TRUNCATE TABLE bar1                                                            |      1 |      0
       1413.53706 | 1413.53706 | 1413.53706 | CREATE TABLE IF NOT EXISTS bar1(id BIGINT PRIMARY KEY, foo VARCHAR(10) UNIQUE) |      1 |      0
(7 rows)

 pg_size_pretty
----------------
 132 MB
(1 row)

Enter fullscreen mode Exit fullscreen mode

The one-row DML are in single-digit millisecond. That's what is expected. You can run the same with other databases, but whatever the result, those doesn't really make sense.

The query stats are also visible in the console:

Image description

Replication Factor 3 cluster

You can add more nodes to the YugabyteDB cluster:

docker run -d --name yb-n2 --hostname n2 --network yb yugabytedb/yugabyte yugabyted start --join=yb-n1 --advertise_address=n2.yb--background=false
docker run -d --name yb-n3 --hostname n3 --network yb yugabytedb/yugabyte yugabyted start --join=yb-n1 --advertise_address=n3.yb--background=false
Enter fullscreen mode Exit fullscreen mode

The data protection has switched to Replication Factor 3:
Image description

However this benchmark is not relevant for such configuration because it is single-threaded. When you distribute the database, you expect to distribute the load. And if you do that for High Availability only, then you want the nodes on different zones. That's why I stop there. There are much more significant workloads to run if you want to compare databases.

All databases have also some optimizations that may not be enabled by default (for different reasons, backward compatibility, and rolling upgrades are some of them). If you want to run it in an optimized way on YugabyteDB, I suggest to enable Packed Rows and Colocation

💖 💪 🙅 🚩
franckpachot
Franck Pachot

Posted on July 27, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Another funny benchmark: "hugedbbench"