Snapshot too old in YugabyteDB
Franck Pachot
Posted on April 24, 2024
SQL databases store the current state and enough information to read about a previous state with Multi-Version Concurrency Control. Keeping all change records would not be scalable, so we only keep enough history for the oldest ongoing transaction. There are two possibilities when long ongoing transactions are running:
- Let history grow. During VACUUM, PostgreSQL keeps old records that are more recent than the database transaction horizon, allowing bloat to persist.
- Fail long transactions after a time limit, like Oracle
undo_retention
or YugabyteDBtimestamp_history_retention_interval_sec
The second solution avoids runaway queries causing problems with other transactions. A growing MVCC history can impact performance and operations, affecting reads, memory, storage, and backups. Therefore, it's best to prevent MVCC history from growing uncontrollably. However, such an error must give enough information to understand the reason and fix the runaway query, and you should be able to understand it.
An example
I start YugabyteDB in a Docker container:
-bash-4.2# docker run --rm -it yugabytedb/yugabyte bash
[root@066127d97d21 yugabyte]#
[root@066127d97d21 yugabyte]# yugabyted start
Starting yugabyted...
✅ YugabyteDB Started
✅ UI ready
✅ Data placement constraint successfully verified
...
+---------------------------------------------------------------------------------------------------------+
| yugabyted |
+---------------------------------------------------------------------------------------------------------+
| Status : Running. |
| Replication Factor : 1 |
| YugabyteDB UI : http://172.17.0.3:15433 |
| JDBC : jdbc:postgresql://172.17.0.3:5433/yugabyte?user=yugabyte&password=yugabyte |
| YSQL : bin/ysqlsh -h 172.17.0.3 -U yugabyte -d yugabyte |
| YCQL : bin/ycqlsh 172.17.0.3 9042 -u cassandra |
| Data Dir : /root/var/data |
| Log Dir : /root/var/logs |
| Universe UUID : 640b90ac-c720-418a-bfd4-03f3eb106bab |
+---------------------------------------------------------------------------------------------------------+
...
[root@066127d97d21 yugabyte]#
[root@066127d97d21 yugabyte]# ysqlsh -h $(hostname)
ysqlsh (11.2-YB-2.21.0.0-b0)
Type "help" for help.
yugabyte=#
I create a demo table with one row:
yugabyte=# create table demo
( id bigserial primary key , value text )
;
CREATE TABLE
yugabyte=# insert into demo(value)
select 'Hello World' from generate_series(1,1)
;
INSERT 0 1
I start a transaction, read from my table, and also show some time information, in Epoch, about the transaction time and the current time:
yugabyte=# begin transaction isolation level repeatable read
;
BEGIN
yugabyte=# select * from demo
;
id | value
----+-------------
1 | Hello World
(1 row)
yugabyte=# select
current_setting('yb_effective_transaction_isolation_level')
, extract(epoch from transaction_timestamp()) as tx
, extract(epoch from clock_timestamp()) as clock
;
current_setting | tx | clock
-----------------+------------------+------------------
repeatable read | 1713905945.54437 | 1713905960.63649
(1 row)
Still in the transaction, I wait a few minutes and recheck the time. For this demo, I don't want to wait 15 minutes, which is the default, so I set it temporarily to 60 seconds (with timestamp_history_retention_interval_sec
) and run a full compaction:
yugabyte=# select
current_setting('yb_effective_transaction_isolation_level')
, extract(epoch from transaction_timestamp()) as tx
, extract(epoch from clock_timestamp()) as clock
;
current_setting | tx | clock
-----------------+------------------+------------------
repeatable read | 1713905945.54437 | 1713906114.97461
(1 row)
yugabyte=# \! yb-ts-cli --server_address=$(hostname) set_flag --force timestamp_history_retention_interval_sec 60
yugabyte=# \! yb-ts-cli --server_address=$(hostname) compact_all_tablets
Successfully compacted all tablets
yugabyte=# \! yb-ts-cli --server_address=$(hostname) set_flag --force timestamp_history_retention_interval_sec 900
I did not make any updates, but the database doesn't know. Had I made any modifications, the versions from before the compaction time minus 60 seconds would have been deleted. The database cannot guarantee a consistent read since my transaction's read time is from before that time. The snapshot required for the reading is too old.
yugabyte=# select
current_setting('yb_effective_transaction_isolation_level')
, extract(epoch from transaction_timestamp()) as tx
, extract(epoch from clock_timestamp()) as clock
;
current_setting | tx | clock
-----------------+------------------+------------------
repeatable read | 1713905945.54437 | 1713906169.19518
(1 row)
yugabyte=# select * from demo;
ERROR: Snapshot too old. Read point: { physical: 1713905951675923 }, earliest read time allowed: { physical: 1713906057639729 }, delta (usec): 105963806: kSnapshotTooOld
The Snapshot too old
message gives essential information: the read point (the start of my transaction as I'm at a Repeatable Read isolation level) and the earliest time allowed (the retention time when compaction occurred). The delta is the difference between the two and gives an idea of the minimum retention that would have been required to run this query.
Here is the timeline with the numbers above:
1713905945.54437 Transaction time
1713905951.675923 Read point (snapshot) <-----------------+ Snapshot
1713905960.63649 Clock at transaction start | Too Old
1713906057.639729 Earliest read time allowed <-----+ |
1713906114.97461 Clock before compaction | |
(timestamp_history_retention_interval_sec) ---+ |
1713906169.19518 Clock before SELECT error |
(transaction read time) --------+
When you encounter the "Snapshot Too Old" error, you should check if the duration of the statement/transaction is expected and then increase the MVCC retention. If it is the query that has a problem, you should fix it first.
yb_read_time
Note that in the latest version, you can set the read time yourself with yb_read_time
. It is similar to Oracle's flashback query. Be careful, and it should be used cautiously for specific use cases, like recovery from errors.
Here is an example with my table:
yugabyte=# select * from demo;
id | value
----+-------------
1 | Hello World
I get the current time as an Epoch (in microseconds):
yugabyte=# select (1000000*extract(epoch from now()))::bigint now;
\gset
now
------------------
1713965275094928
(1 row)
yugabyte=# \gset
I use this variable to set the read time for this flashback query
yugabyte=# set yb_read_time=:now;
NOTICE: 00000: yb_read_time should be set with caution.
DETAIL: No DDL operations should be performed while it is set and it should not be set to a timestamp before a
DDL operation has been performed. It does not have well defined semantics for normal transactions and is only to
be used after consultation
SET
A warning says it does not follow the SQL transaction semantics. That is because we cannot apply the current transaction changes to a past state.
I insert a new row and query my table:
yugabyte=# insert into demo(value)
select 'Hello Again' from generate_series(1,1)
;
INSERT 0 1
yugabyte=# select * from demo;
id | value
----+-------------
1 | Hello World
(1 row)
The new row is invisible because my read point is before the insert.
I can see my row if I revert to the standard SQL behavior where the read time is the beginning of the transaction or statement:
yugabyte=# set yb_read_time=0;
NOTICE: 00000: yb_read_time should be set with caution.
DETAIL: No DDL operations should be performed while it is set and it should not be set to a timestamp before a
DDL operation has been performed. It does not have well defined semantics for normal transactions and is only to
be used after consultation
SET
yugabyte=# select * from demo;
id | value
----+-------------
2 | Hello Again
1 | Hello World
(2 rows)
The warning also says that it should not be set to before DDL. If I go too far in time, there are no rows displayed because the table didn't exist at that time (but the query is still parsed with the current catalog):
yugabyte=# set yb_read_time=1713879482182794;
NOTICE: 00000: yb_read_time should be set with caution.
DETAIL: No DDL operations should be performed while it is set and it should not be set to a timestamp before a DDL operation has been performed. It does not have well defined semantics for normal transactions and is only to be used after consultation
SET
yugabyte=# select * from demo;
id | value
----+-------
(0 rows)
I'll let you try a date in the future. It will show the current version because the database is unaware of future changes.
To summarize
"Snapshot Too Old" error is expected for transactions (in Repeatable Read or Serializable isolation level) or queries (in Read Committed) that run more than the MVCC retention. It defaults to 15 minutes but can be increased with timestamp_history_retention_interval_sec.
Note that this retention value can be internally increased to support database snapshots and allow for Point In Time Recovery at any time between two snapshots. Additionally, the read point can be set to query as a past time point. This can be used to recover from errors in the current database or a clone.
The SQL State of snapshot_too_old
is 72000 (Class 72 — Snapshot Failure) and not considered a re-tryable error like the serialization error (Class 40 — Transaction Rollback). If you restart the transaction, it will acquire a newer read timestamp and may be successful. In the Read Committed isolation level, restarting a statement within an open transaction will also result in obtaining a newer read timestamp. Nonetheless, if the restarted statement takes approximately the same amount of time as before, there is a significant chance it will encounter the same error again. The solution for snapshot_too_old
is to understand the query duration to see if the query must be tuned or the retention must be increased.
In PostgreSQL 17, this feature has been removed due to a number of known problems in terms of correctness and performance. This is unfortunate, as it was the only way to prevent long-running transactions from causing indefinite table bloat. In contrast, YugabyteDB offers reliable, efficient, and scalable MVCC (Multi-Version Concurrency Control) snapshots by design, thanks to its built-in Raft replication and LSM (Log-Structured Merge) Tree storage. Limiting MVCC retention is essential for critical Online Transaction Processing (OLTP) systems that require predictable performance.
Posted on April 24, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.