Scaling Token Buckets Rate limiting with YugabyteDB
Franck Pachot
Posted on January 6, 2022
In the Part 4 of this series, I've run my RateLimitDemo.java
program on PostgreSQL in Amazon RDS with and without updating the same rows, in Read Committed and Serializable isolation. The results were:
- rate: 1000 token/s (100% transactions successful) with 50 threads on different
id
s in Read Committed isolation level - rate: 930 token/s (100% transactions successful) with 50 threads on different
id
s in Serializable isolation level - rate: 124 token/s (100.00% transactions successful) with 50 threads on the same
id
s in Read Committed isolation level - rate: 218 token/s (83% transactions successful) with 10 threads on the same
id
s in Serializable isolation level
YugabyteDB Cloud
In the same idea of using a managed service I've created a 3 nodes YugabyteDB cluster on AWS through the Yugabyte DBaaS:
YugabyteDB READ COMMITTED different id
In my RateLimitDemo.java
I change the id
to concatenate the session pid: rate_limiting_request(?||pg_backend_pid(),?)
and I set Read Committed isolation level: connection.setTransactionIsolation(Connection.TRANSACTION_READ_COMMITTED)
java RateLimitDemo 50 "jdbc:yugabytedb://47cc8863-9344-4a9c-bc02-0dd9f843dceb.cloudportal.yugabyte.com/yugabyte?user=admin&password=Covid-19" "user2" 1000 20 | awk 'BEGIN{t=systime()}/remaining$/{c=c+1;p=100*$5/$3}NR%100==0{printf "rate: %8.2f/s (last pct: %5.2f) max retry:%3d\n",c/(systime()-t),p,retry}/retry/{sub(/#/,"",$6);if($6>retry)retry=$6}'
The Read Committed is provided for PostgreSQL compatibility but is not recommended, so I'm just showing that the rate is the same as with PostgreSQL:
rate: 1063.08/s (last pct: 100.00) max retry: 1
rate: 1063.32/s (last pct: 100.00) max retry: 1
rate: 1063.56/s (last pct: 100.00) max retry: 1
rate: 1063.80/s (last pct: 100.00) max retry: 1
rate: 1064.05/s (last pct: 100.00) max retry: 1
rate: 1064.29/s (last pct: 100.00) max retry: 1
rate: 1064.53/s (last pct: 100.00) max retry: 1
rate: 1062.20/s (last pct: 100.00) max retry: 1
rate: 1062.44/s (last pct: 100.00) max retry: 1
rate: 1062.68/s (last pct: 100.00) max retry: 1
rate: 1062.92/s (last pct: 100.00) max retry: 1
rate: 1063.16/s (last pct: 100.00) max retry: 1
There's one difference with PostgreSQL which, using pessimistic locking had no retries. YugabyteDB uses optimistic locking here.
YugabyteDB SERIALIZABLE different id
In my RateLimitDemo.java
I keep the id
to concatenated the session pid: rate_limiting_request(?||pg_backend_pid(),?)
and I set Read Committed isolation level: connection.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE)
The throughput is the same:
rate: 1079.99/s (last pct: 100.00) max retry: 2
rate: 1079.38/s (last pct: 100.00) max retry: 2
rate: 1079.44/s (last pct: 100.00) max retry: 2
rate: 1079.51/s (last pct: 100.00) max retry: 2
rate: 1079.57/s (last pct: 100.00) max retry: 2
rate: 1079.63/s (last pct: 100.00) max retry: 2
rate: 1079.69/s (last pct: 100.00) max retry: 2
rate: 1079.75/s (last pct: 100.00) max retry: 2
rate: 1079.81/s (last pct: 100.00) max retry: 2
rate: 1079.87/s (last pct: 100.00) max retry: 2
rate: 1079.94/s (last pct: 100.00) max retry: 2
rate: 1080.00/s (last pct: 100.00) max retry: 2
Serializable is the right isolation level. The amount of retries is still low.
Here are the major performance metrics during this run:
- 1000 YSQL ops/s matches the rate of tokens per second because each token request is one UPDATE statement (only the first one for each
id
has an additional insert) - 1000 ops/s is the number of operations in DocDB, the distributed storage and transaction layer, where operations are replicated to the 3 nodes with the Raft protocol. Here, each UPDATE ... RETURNING is a transactional read+write and shows up as one read and one write. There are some cases when the RETURNING is pushed down to be one operation but that's not done currently on calculated values (because the calculation is done on the YSQL layer)
- 5 millisecond latency on average for those UPDATE. Because the table is hash sharded on the
id
those operations are sent to the right node, the tablet leader, and wait to get the write quorum from another node (I have a multi-AZ configuration here) - I've displayed compaction to explain the little increase of latency from 5ms to 7ms when compaction is occurring in the background. All updates are stored as new versions for the column value and SST file compaction gets rid of previous versions.
We have a view on pg_stat_statements confirming this average time of 5 milliseconds.
"local block" stats are zero because there's no temporary table involved here. We do not display shared block hits because YSQL is stateless. The PostgreSQL code is reused only above the table access method, which transforms tuple manipulation to DocDB operations.
So, this is about running this Token Bucket algorithm without collisions. The performance is the same as with PostgreSQL, with the additional possibility to scale out. The load is distributed to 3 nodes here, and replication factor RF=3 let all continue transparently if one node goes down for planned maintenance of unplanned outage.
PostgreSQL READ COMMITTED with same id
Now stressing the race condition with 50 threads on the same id
. In my RateLimitDemo.java
I put back the id
alone: rate_limiting_request(?,?)
and I set Read Committed isolation level: connection.setTransactionIsolation(Connection.TRANSACTION_READ_COMMITTED)
rate: 126.99/s (last pct: 96.69) max retry: 4
rate: 127.59/s (last pct: 84.78) max retry: 4
rate: 127.39/s (last pct: 95.62) max retry: 4
rate: 127.17/s (last pct: 98.37) max retry: 4
rate: 126.97/s (last pct: 96.71) max retry: 4
rate: 127.58/s (last pct: 95.68) max retry: 4
rate: 127.38/s (last pct: 97.62) max retry: 4
rate: 127.15/s (last pct: 96.68) max retry: 4
rate: 127.76/s (last pct: 88.00) max retry: 4
rate: 127.54/s (last pct: 92.53) max retry: 4
rate: 127.34/s (last pct: 94.48) max retry: 4
rate: 127.91/s (last pct: 92.85) max retry: 4
rate: 127.69/s (last pct: 92.68) max retry: 4
rate: 127.50/s (last pct: 92.87) max retry: 4
rate: 128.02/s (last pct: 83.57) max retry: 4
rate: 127.78/s (last pct: 95.61) max retry: 4
rate: 128.33/s (last pct: 95.61) max retry: 4
rate: 128.14/s (last pct: 95.66) max retry: 4
rate: 128.74/s (last pct: 98.04) max retry: 4
rate: 128.51/s (last pct: 92.40) max retry: 4
rate: 128.29/s (last pct: 94.63) max retry: 4
The rate is much lower here, like it was with PostgreSQL, because all threads compete to update the same row. First, only one node is busy, as this row is in one tablet. Second, there are retries, even in Read Committed isolation level.
Here is a the error cought:
(pid@host 23119@10.8.11.222) 3771 calls 3487 tokens 3.8 /sec 60000 remaining
(pid@host 22797@10.8.11.222) 2706 calls 2460 tokens 2.6 /sec 60000 remaining
(pid@host 23431@10.8.13.238) 1513 calls 1422 tokens 1.5 /sec 60000 remaining
2022-01-05T23:00:26.369987Z SQLSTATE 40001 on retry #0 com.yugabyte.util.PSQLException: ERROR: All transparent retries exhausted. Operation failed. Try again.: Value write after transaction start: { physical: 1641423626346947 } >= { physical: 1641423625848215 }: kConflict
2022-01-05T23:00:26.371342Z SQLSTATE 40001 on retry #0 com.yugabyte.util.PSQLException: ERROR: Operation expired: Transaction aborted: kAborted
(pid@host 22820@10.8.8.14) 2568 calls 2482 tokens 2.7 /sec 60000 remaining
(pid@host 23082@10.8.8.14) 2272 calls 2186 tokens 2.4 /sec 60000 remaining
This is the one I catch in my program, to retry it after a few milliseconds, but it was already retried automatically in the database (All transparent retries exhausted
). This explains why it is slower even with a small number of application retries.
PostgreSQL SERIALIZABLE with same id
Last test, in my RateLimitDemo.java
I put back the id
alone: rate_limiting_request(?,?)
and I set Read Committed isolation level: connection.setTransactionIsolation(Connection.SERIALIZABLE)
Given the previous result, I've reduced the number of threads to 10, as I did with PostgreSQL, and run it during the night. Here is the rate of tokens acquired per seconds:
rate: 116.60/s (last pct: 93.95) max retry: 8
rate: 116.60/s (last pct: 93.25) max retry: 8
rate: 116.60/s (last pct: 92.66) max retry: 8
rate: 116.60/s (last pct: 88.60) max retry: 8
rate: 116.60/s (last pct: 88.48) max retry: 8
rate: 116.60/s (last pct: 93.95) max retry: 8
rate: 116.59/s (last pct: 88.60) max retry: 8
rate: 116.59/s (last pct: 93.92) max retry: 8
rate: 116.59/s (last pct: 93.92) max retry: 8
rate: 116.59/s (last pct: 92.66) max retry: 8
rate: 116.59/s (last pct: 87.81) max retry: 8
rate: 116.59/s (last pct: 88.73) max retry: 8
rate: 116.59/s (last pct: 87.81) max retry: 8
rate: 116.58/s (last pct: 91.75) max retry: 8
rate: 116.58/s (last pct: 92.66) max retry: 8
rate: 116.58/s (last pct: 93.25) max retry: 8
rate: 116.58/s (last pct: 93.95) max retry: 8
rate: 116.58/s (last pct: 87.81) max retry: 8
rate: 116.58/s (last pct: 91.75) max retry: 8
rate: 116.59/s (last pct: 93.95) max retry: 8
The throughput is low, given the update conflicts with all threads on the same row, but the latency stays good: 7344625.95/3387433=2.2 milliseconds on average for this query (the rate_limiting_request(?,?)
one):
The other one 9869491.81/2149291=4.6ms was the previous test on different id
. Yes, in this race condition, throughput is lower as it cannot be distributed but response time is higher given data locality in RAM and CPU).
The performance metrics show the same:
The night run is in the middle (I started another test with 10 threads on same id
and 50 threads on different ones, which is more realistic), with low CPU usage. If you are in this race condition with all token requests on few users or tenants IDs, and need higher throughput, this Token Bucket is not scalable. I'll show another algorithm in the next posts.
Posted on January 6, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.