UPSERT all columns in YugabyteDB (ON CONFLICT DO UPDATE)
Franck Pachot
Posted on January 8, 2024
In a previous post, we were implementing an ON CONFLICT DO UPDATE with Common Table Expressions to get better performance and row count feedback. The idea was to update only a few columns for existing keys. The fewer columns you update, the fewer indexes you have to maintain.
Here is another case where we want to update all columns. This does the same as the yb_enable_upsert_mode
mentioned in the previous post but without the inconvenience of rebuilding all secondary indexes. Where there are many columns to update in YugabyteDB it is faster to delete and re-insert than update many columns. The reason is that INSERT packs the rows with one RocksDB entry for each SQL row, but UPDATE is one for each column, which is packed later by compaction.
Given the tables created in the previous post, the following query will delete all rows in demo
for all keys in demo_new
and then insert them with v1
from new_demo
and v0
from demo
(or null if the key did not exist):
set yb_bnl_batch_size=1024;
with
upsert(k,v1) as (
select * from demo_new
),
deleted(k) as (
delete from demo using upsert
where demo.k = upsert.k
returning demo.*
),
inserted(k) as (
insert into demo(k,v0,v1) select upsert.k, deleted.v0, upsert.v1
from upsert left outer join deleted on deleted.k = upsert.k
returning k
) select deleted, inserted-deleted as inserted from
(select count(*) as deleted from deleted) as delete_count,
(select count(*) as inserted from inserted) as insert_count
;
The result is:
deleted | inserted
---------+----------
2 | 3
(1 row)
yugabyte=*# select * from demo order by 1;
k | v0 | v1
----+----+----
1 | 0 | 1
2 | 0 | 2
3 | 0 | 1
4 | 0 | 2
5 | 0 | 1
6 | | 2
8 | | 2
10 | | 2
(8 rows)
Here is the execution plan with read and write requests (explain (analyze, dist)
):
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=399.67..399.72 rows=1 width=16) (actual time=5.850..5.851 rows=1 loops=1)
CTE upsert
-> Function Scan on generate_series i (cost=0.00..12.50 rows=1000 width=8) (actual time=0.011..0.012 rows=5 loops=1)
CTE deleted
-> Delete on demo (cost=0.00..20.01 rows=1000 width=116) (actual time=1.358..1.375 rows=2 loops=1)
Storage Table Write Requests: 2.000
Storage Index Write Requests: 2.000
-> YB Batched Nested Loop Join (cost=0.00..20.01 rows=1000 width=116) (actual time=1.315..1.322 rows=2 loops=1)
Join Filter: (demo.k = upsert.k)
-> CTE Scan on upsert (cost=0.00..20.00 rows=1000 width=32) (actual time=0.016..0.019 rows=5 loops=1)
-> Index Scan using demo_pkey on demo (cost=0.00..4.11 rows=1 width=88) (actual time=1.209..1.212 rows=2 loops=1)
Index Cond: (k = ANY (ARRAY[(upsert.k)::bigint, ($3)::bigint, ($4)::bigint, ..., ($1025)::bigint]))
Storage Table Read Requests: 1
Storage Table Read Execution Time: 1.045 ms
CTE inserted
-> Insert on demo demo_1 (cost=139.66..232.16 rows=5000 width=16) (actual time=4.398..4.452 rows=5 loops=1)
Storage Table Write Requests: 5.000
Storage Index Write Requests: 5.000
Storage Flush Requests: 1
Storage Flush Execution Time: 4.284 ms
-> Merge Left Join (cost=139.66..232.16 rows=5000 width=16) (actual time=0.026..0.033 rows=5 loops=1)
Merge Cond: (upsert_1.k = deleted_1.k)
-> Sort (cost=69.83..72.33 rows=1000 width=8) (actual time=0.016..0.016 rows=5 loops=1)
Sort Key: upsert_1.k
Sort Method: quicksort Memory: 25kB
-> CTE Scan on upsert upsert_1 (cost=0.00..20.00 rows=1000 width=8) (actual time=0.000..0.001 rows=5 loops=1)
-> Sort (cost=69.83..72.33 rows=1000 width=12) (actual time=0.007..0.007 rows=2 loops=1)
Sort Key: deleted_1.k
Sort Method: quicksort Memory: 25kB
-> CTE Scan on deleted deleted_1 (cost=0.00..20.00 rows=1000 width=12) (actual time=0.001..0.001 rows=2 loops=1)
-> Aggregate (cost=22.50..22.51 rows=1 width=8) (actual time=1.383..1.383 rows=1 loops=1)
-> CTE Scan on deleted (cost=0.00..20.00 rows=1000 width=0) (actual time=1.361..1.379 rows=2 loops=1)
-> Aggregate (cost=112.50..112.51 rows=1 width=8) (actual time=4.463..4.463 rows=1 loops=1)
-> CTE Scan on inserted (cost=0.00..100.00 rows=5000 width=0) (actual time=4.401..4.458 rows=5 loops=1)
Planning Time: 1.597 ms
Execution Time: 10.475 ms
Storage Read Requests: 1
Storage Read Execution Time: 1.045 ms
Storage Write Requests: 14.000
Catalog Read Requests: 0
Catalog Write Requests: 0.000
Storage Flush Requests: 2
Storage Flush Execution Time: 7.503 ms
Storage Execution Time: 8.548 ms
Peak Memory Usage: 4959 kB
There are 14 Write Requests
, for 2 rows deleted and 5 rows inserted (3 new and 2 from the deleted ones joined to new values), for the table and the secondary index. I can be higher if there are more secondary index. But those are batched and we see only 2 Flush Requests
, one for the delete and one for the insert, which is where the network latency adds to the response time. This is scalable as it doesn't depend on the number of rows.
Is it better to UPDATE first and INSERT the remaining, or DELETE and INSERT all rows? This depends on the expected number of existing keys vs. new ones, and the number of columns, and the number of indexes on those columns. That's the main advantage of using a the standard SQL WITH clause rather than the PostgreSQL specific ON CONFLICT: more control on how the new values are merged with the existing rows.
Posted on January 8, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 25, 2024