Global Deadlock Resolution in GBase 8c Transactions and Locks

congcong

Cong Li

Posted on June 24, 2024

Global Deadlock Resolution in GBase 8c Transactions and Locks

GBase 8c database features mechanisms for deadlock detection and automatic resolution. It comprises multiple CNs (Coordinating Nodes) and DN (Data Nodes). Deadlocks can occur within a single CN or DN, or across multiple CNs or DNs. Deadlocks occurring across multiple CNs or DNs are termed global deadlocks, where processes across multiple databases in the cluster cyclically wait for resources. This article primarily discusses distributed global deadlock resolution.

Image description

As depicted in the figure above, at time T1, Transaction 1 begins (begin), at T2, Transaction 1 updates (update) the t column for id=1, while Transaction 2 begins. At T3, Transaction 2 updates the t column for id=4. Subsequently, at T4, Transaction 1 attempts to update the t value for id=4, and Transaction 2 attempts to update id=1's t value, resulting in mutual waiting and thus a global deadlock.

Global deadlock detection algorithms mainly fall into two categories: centralized and distributed:

1. Centralized: The GTM node (Global Transaction Manager) collects transaction lock wait information from other nodes in the cluster to construct a global wait-for graph. It then queries for deadlock cycles (using algorithms like depth-first search or topological sorting) and issues commands to terminate transactions involved in deadlocks. This approach can overload the GTM node, potentially becoming a cluster performance bottleneck. Moreover, if the GTM node encounters issues, deadlock detection becomes ineffective, making this approach less recommended.

2. Distributed: (Currently used in GBase 8c) Each CN initiates deadlock detection independently. Detection messages propagate along the wait-for relationships among transaction processing threads across nodes. If a transaction processing thread receives its own detection message, it indicates a global deadlock, prompting the transaction to rollback and resolve the deadlock.

Image description

Example:

When Transaction 1 detects that data it wishes to update is locked by another transaction, it sends a waiting message to the node holding the lock—in this example, CN2, where Transaction 2 originated. Similarly, Transaction 2, upon finding that the data it needs is locked by Transaction 1, sends a waiting message to CN1, where Transaction 1 is running. The transactions wait for a predetermined timeout period. If the waiting cycle is detected by either node within this period, the node initiating the detection exits its transaction, thereby resolving the global deadlock.

Testing Method:

The default deadlock timeout is 1 second, modified to 20 seconds:

show deadlock_timeout ;
alter system set deadlock_timeout=20;
Enter fullscreen mode Exit fullscreen mode

Create Test Table

create table test(id int,info text);
insert into test values(1,'Tom');
insert into test values(2,'Lane');
Enter fullscreen mode Exit fullscreen mode

session1, Execute Update

begin;
update test set info = 'test' where id = 1;
Enter fullscreen mode Exit fullscreen mode

session2, Execute Update

begin;
update test set info = 'test' where id = 2;
Enter fullscreen mode Exit fullscreen mode

session1, Execute Update

update test set info = 'test' where id = 2;  --stuck
Enter fullscreen mode Exit fullscreen mode

session2, Execute Update

update test set info = 'test' where id = 1;  --stuck
Enter fullscreen mode Exit fullscreen mode

After 20 seconds, one session's transaction detects and terminates the deadlock, while the other session successfully commits.

💖 💪 🙅 🚩
congcong
Cong Li

Posted on June 24, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related