PostgreSQL lock trees
Bolaji Wahab
Posted on January 13, 2022
There are two forms of locks in PostgreSQL, row level and object level. While most operations would simply take row level locks, migrations usually take the more aggressive ones (object level locks).
PostgreSQL keeps a queue of locks, with first in first out (FIFO). While readers do not block writers, often times migration ends up blocking several other operations on the database leading to lock contention. This is due to the FIFO nature of the lock queue, there is no jumping the queue.
Imagine a migration which needs to alter a table (requiring access exclusive lock on the table), it blocks other operations even those doing ordinary SELECT
. Also, the migration might have to wait to take the necessary lock if there are conflicting locks on the table leading to lock contention.
Often times when there is a lock contention, we end up with questions such as what (human or application) is the main root of the lock contention.
Here is an example of other people's work on the subject which you might find helpful:
postgresql-lock-trees by Nikolay Samokhvalov.
After experimenting with the above example and the drawback being the call on the function pg_blocking_pids(int)
, I started looking at how we can simply do away with calling the function.
Then I rewrote the query, relying on lock conflicts as listed on the documentation Conflicting Lock Modes, pg_locks
and pg_stat_activity
.
The query:
WITH conflicts(lock, conflict) AS (VALUES
('AccessShare', 'AccessExclusive'), ('RowShare', 'Exclusive'),
('RowShare', 'AccessExclusive'), ('RowExclusive', 'Share'),
('RowExclusive', 'ShareRowExclusive'), ('RowExclusive', 'Exclusive'),
('RowExclusive', 'AccessExclusive'), ('ShareUpdateExclusive', 'ShareUpdateExclusive'),
('ShareUpdateExclusive', 'Share'), ('ShareUpdateExclusive', 'ShareRowExclusive'),
('ShareUpdateExclusive', 'Exclusive'), ('ShareUpdateExclusive', 'AccessExclusive'),
('Share', 'RowExclusive'), ('Share', 'ShareUpdateExclusive'),
('Share', 'ShareRowExclusive'), ('Share', 'Exclusive'),
('Share', 'AccessExclusive'), ('ShareRowExclusive','RowExclusive'),
('ShareRowExclusive', 'ShareUpdateExclusive'), ('ShareRowExclusive', 'Share'),
('ShareRowExclusive', 'ShareRowExclusive'), ('ShareRowExclusive', 'Exclusive'),
('ShareRowExclusive', 'AccessExclusive'), ('Exclusive', 'RowShare'),
('Exclusive', 'RowExclusive'), ('Exclusive', 'ShareUpdateExclusive'),
('Exclusive', 'Share'), ('Exclusive', 'ShareRowExclusive'),
('Exclusive', 'Exclusive'), ('Exclusive', 'AccessExclusive'),
('AccessExclusive', 'AccessShare'), ('AccessExclusive', 'RowShare'),
('AccessExclusive', 'RowExclusive'), ('AccessExclusive', 'ShareUpdateExclusive'),
('AccessExclusive', 'Share'), ('AccessExclusive', 'ShareRowExclusive'),
('AccessExclusive','Exclusive'), ('AccessExclusive','AccessExclusive')
),
activity AS (
SELECT
datname,
pid,
usename,
application_name,
client_addr,
xact_start,
query_start,
state_change,
backend_xid,
backend_xmin,
state,
wait_event_type,
wait_event,
query
FROM pg_stat_activity
WHERE pid != pg_backend_pid()
),
locks AS (
SELECT
pid,
locktype,
granted,
relation::regclass AS relation,
left(mode,-4) AS mode,
ROW(locktype,database,relation::regclass,page,tuple,virtualxid,transactionid,classid,objid,objsubid) AS obj
FROM pg_locks
),
pairs AS (
SELECT
w.pid AS waiter,
l.pid AS locker,
l.obj,
l.mode,
l.relation
FROM locks AS w
JOIN locks AS l ON l.obj IS NOT DISTINCT FROM w.obj AND l.locktype = w.locktype AND NOT l.pid = w.pid AND l.granted
WHERE NOT w.granted
),
leaders AS (
SELECT
DISTINCT(locker),
coalesce(relation, (select max(relation) from locks where pid = locker)) AS relation,
mode
FROM (SELECT *, rank() OVER (PARTITION BY locker ORDER BY relation NULLS FIRST) AS pos FROM pairs) AS ss
WHERE pos = 1
),
lock_queue AS (
SELECT
array_agg(l.pid::text || '.' || l.mode order by p.query_start) AS queue,
rel.relation
FROM locks AS l
JOIN activity p ON p.pid = l.pid
JOIN activity a ON a.pid = l.pid,
LATERAL (SELECT coalesce(relation, max(relation)) AS relation FROM locks WHERE pid = l.pid AND relation IS NOT NULL GROUP BY relation) AS rel
WHERE NOT l.granted
GROUP BY rel.relation
),
push_root_on_lock_queue AS (
SELECT
l.relation,
array_prepend(p.locker::text || '.' || p.mode, queue) AS queue
FROM lock_queue AS l
JOIN leaders AS p ON l.relation = p.relation
),
lock_queue_element_num AS (
SELECT
relation,
a.element,
a.num
FROM push_root_on_lock_queue, unnest(queue) WITH ORDINALITY a(element, num)
),
main AS (
SELECT
a.relation,
split_part(a.element, '.', 1) AS prev_pid,
split_part(a.element, '.', 2) AS prev_mode,
split_part(b.element, '.', 1) AS pid,
split_part(b.element, '.', 2) AS mode,
a.num AS start,
b.num AS next,
CASE
WHEN EXISTS (SELECT * FROM conflicts AS c WHERE c.lock = split_part(a.element, '.', 2) AND c.conflict = split_part(b.element, '.', 2)) THEN true
ELSE false
END AS conflict
FROM lock_queue_element_num AS a
JOIN lock_queue_element_num b ON a.relation = b.relation
WHERE a.num < b.num
ORDER BY b.num DESC
),
tree AS (
SELECT
DISTINCT ON (pid, prev_pid)
*,
dense_rank() OVER (PARTITION BY relation ORDER BY start) - 1 AS level
FROM main
),
stat AS (
SELECT
m.pid,
p.locker,
array_agg(DISTINCT m.prev_pid || ':' || m.level) AS blocked_by
FROM pairs AS p
JOIN tree AS m ON p.waiter = m.pid::int
WHERE conflict
AND m.pid != m.prev_pid
GROUP BY m.pid, p.locker
),
/* sometimes we can miss some blockers in pairs due to the lock path, the query should be fix normally but for now this is workaround
TODO: fix pairs query to include all path
*/
sub AS (
SELECT
pidd,
count(pidd) AS blocked_cnt
FROM (
SELECT DISTINCT ON (pid) split_part(unnest(blocked_by), ':', 1) AS pidd
FROM stat AS s
) AS blocked_cnt
WHERE NOT EXISTS (SELECT FROM pairs AS p WHERE p.locker = blocked_cnt.pidd::int)
GROUP BY pidd
),
result AS (
/* add details of waiters */
SELECT
DISTINCT(s.pid::int),
s.blocked_by,
(clock_timestamp() - a.xact_start)::interval(0) AS ts_age,
(clock_timestamp() - a.state_change)::interval(0) AS change_age,
CASE
WHEN s.blocked_by = '{}' THEN NULL::interval(0)
ELSE (clock_timestamp() - a.query_start)::interval(0)
END AS blocking_age,
a.backend_xid AS xid,
a.backend_xmin AS xmin,
replace(a.state, 'idle in transaction', 'idletx') AS state,
a.datname,
a.usename,
a.client_addr,
a.application_name,
a.wait_event_type || ':' || a.wait_event AS wait,
blocked.cnt AS blocked_cnt,
CASE
WHEN s.blocked_by = '{}' THEN true
ELSE false
END AS root,
trim(trailing ';' from format(
'%s %s%s',
'[' || s.pid::text || ']',
coalesce(repeat('.', array_length(s.blocked_by, 1)+1), '') || ' ',
left(query, 1000)
)) AS query
FROM stat AS s
JOIN activity AS a ON s.pid::int = a.pid,
LATERAL (
SELECT
coalesce((CASE
WHEN count(p.waiter) = 0 THEN (SELECT blocked_cnt FROM sub WHERE sub.pidd = s.pid UNION ALL SELECT 0 FROM sub WHERE NOT EXISTS (SELECT FROM sub WHERE sub.pidd = s.pid))
ELSE count(p.waiter)
END), 0) AS cnt
FROM pairs AS p WHERE p.locker = s.pid::int)AS blocked
UNION ALL
/* add details of lockers */
SELECT
DISTINCT(p.locker::int),
'{}'::text[] AS blocked_by,
(clock_timestamp() - a.xact_start)::interval(0) AS ts_age,
(clock_timestamp() - a.state_change)::interval(0) AS change_age,
NULL::interval AS blocking_age,
a.backend_xid AS xid,
a.backend_xmin AS xmin,
replace(state, 'idle in transaction', 'idletx') AS state,
a.datname,
a.usename,
a.client_addr,
a.application_name,
a.wait_event_type || ':' || a.wait_event AS wait,
blocked.cnt AS blocked_cnt,
true AS root,
trim(trailing ';' from format(
'%s %s%s',
'[' || p.locker::text || ']',
coalesce(repeat('.', 0), '') || ' ',
left(query, 1000)
)) AS query
FROM pairs AS p
JOIN activity AS a ON p.locker::int = a.pid,
LATERAL (SELECT count(t.prev_pid) AS cnt FROM tree t WHERE t.prev_pid = p.locker::text) blocked
WHERE NOT EXISTS (SELECT FROM stat AS s WHERE s.pid::int = p.locker)
/* every other pids not in locker and waiter pairs */
UNION ALL
SELECT
DISTINCT(a.pid),
'{}'::text[] AS blocked_by,
(clock_timestamp() - a.xact_start)::interval(0) AS ts_age,
(clock_timestamp() - a.state_change)::interval(0) AS change_age,
NULL::interval AS blocking_age,
a.backend_xid AS xid,
a.backend_xmin AS xmin,
replace(state, 'idle in transaction', 'idletx') AS state,
a.datname,
a.usename,
a.client_addr,
a.application_name,
a.wait_event_type || ':' || a.wait_event AS wait,
0::int AS blocked_cnt,
true AS root,
trim(trailing ';' from format(
'%s %s%s',
'[' || a.pid::text || ']',
coalesce(repeat('.', 0), '') || ' ',
left(query, 1000)
)) AS query
FROM activity AS a WHERE NOT EXISTS (SELECT FROM pairs AS p WHERE p.locker = a.pid OR p.waiter = a.pid)
ORDER by blocked_cnt DESC, root, pid
)
SELECT
pid,
blocked_by,
ts_age,
change_age,
blocking_age,
xid,
xmin,
state,
datname,
usename,
client_addr,
application_name,
wait,
blocked_cnt,
root,
query
FROM result;
Example output:
pid | blocked_by | ts_age | change_age | blocking_age | xid | xmin | state | datname | usename | client_addr | application_name | wait | blocked_cnt | root | query
-------+-------------------+----------+------------+--------------+--------+--------+--------+---------+---------+-------------+------------------+------------------------------+-------------+------+-------------------------------------------------------
40188 | {} | 00:10:03 | 00:10:00 | <null> | 87324 | <null> | idletx | bolaji | bolaji | <null> | bolaji-psql | Client:ClientRead | 4 | t | [40188] update folder set id = id
40224 | {40188:0} | 00:09:57 | 00:09:56 | 00:09:56 | 87325 | 87324 | active | bolaji | bolaji | <null> | bolaji-psql | Lock:transactionid | 3 | f | [40224] .. delete from folder
40245 | {40188:0,40224:0} | 00:09:51 | 00:09:50 | 00:09:50 | 87326 | 87324 | active | bolaji | bolaji | <null> | bolaji-psql | Lock:relation | 2 | f | [40245] ... alter table folder add column data1 jsonb
40258 | {} | 00:09:41 | 00:09:40 | <null> | 87327 | <null> | idletx | bolaji | bolaji | <null> | bolaji-psql | Client:ClientRead | 1 | t | [40258] drop table sample
40222 | {40245:2} | 00:09:45 | 00:09:45 | 00:09:45 | <null> | 87324 | active | bolaji | bolaji | <null> | bolaji-psql | Lock:relation | 0 | f | [40222] .. select * from folder where id = 1
40253 | {40245:2} | 00:09:47 | 00:09:47 | 00:09:47 | <null> | 87324 | active | bolaji | bolaji | <null> | bolaji-psql | Lock:relation | 0 | f | [40253] .. select * from folder
40260 | {40258:0} | 00:09:38 | 00:09:38 | 00:09:38 | <null> | 87324 | active | bolaji | bolaji | <null> | bolaji-psql | Lock:relation | 0 | f | [40260] .. select * from sample
3640 | {} | <null> | <null> | <null> | <null> | <null> | <null> | <null> | <null> | <null> | | Activity:CheckpointerMain | 0 | t | [3640]
3641 | {} | <null> | <null> | <null> | <null> | <null> | <null> | <null> | <null> | <null> | | Activity:BgWriterHibernate | 0 | t | [3641]
3642 | {} | <null> | <null> | <null> | <null> | <null> | <null> | <null> | <null> | <null> | | Activity:WalWriterMain | 0 | t | [3642]
3643 | {} | <null> | <null> | <null> | <null> | <null> | <null> | <null> | <null> | <null> | | Activity:AutoVacuumMain | 0 | t | [3643]
3645 | {} | <null> | <null> | <null> | <null> | <null> | <null> | <null> | bolaji | <null> | | Activity:LogicalLauncherMain | 0 | t | [3645]
(12 rows)
Time: 4.372 ms
A little explanation:
-
{40188:0}
- translates topid:lock_level
- pid
40245
blocked by{40188:0,40224:0}
- translates to pids40188
on level 0 (the root),40224
also level 0(the root) and40245
on level 1. -
[40224] .. delete from folder
- translates to pid40224
being on level 1 on the lock queue/tree behind the root.
I hope you find the above query useful for lock issue troubleshooting.
Posted on January 13, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.