🏁🐘Winning Race Conditions With PostgreSQL
Randall
Posted on February 17, 2024
Race conditions suck! They can be extremely difficult to debug, and often only occur naturally in production at the most critical moments (times of high traffic). Fortunately, many databases (such as PostgreSQL) give us powerful tools to manage and avoid race conditions.
In this article, we will do a deep dive into how to avoid race conditions while using PostgreSQL, focusing especially on enforcing uniqueness constraints. We'll start with some simpler cases, then move on to more complex ones, so feel free to skip down depending on your level of experience.
This article assumes that you are familiar with race conditions, why they happen, and why they suck. (if you aren't familiar, don't worry, you will learn the hard way soon enough! We all do.) The later cases also assume at least an intermediate knowledge of relational databases.
This article will use the Node.js pg library for examples.
Let's just dive right in and look at some different cases where our code and data are vulnerable to race conditions, and find out how to fix them.
Case 1: Simple Uniqueness
Most applications do not allow two users to share the same email address. If a user tries to register with an email already registered by another account, registration should fail.
One might write such a uniqueness check in JavaScript like this:
async function registerUser(email, hashedPassword) {
const existingUserResult = await pool.query(
'SELECT 1 FROM "user" WHERE email = ?;',
[email],
);
if (existingUserResult.rows.length > 0) {
throw new Error('User already exists');
}
await pool.query(
'INSERT INTO "user" VALUES (?, ?);',
[email, hashedPassword],
);
}
But this code is vulnerable to race conditions. If two users try to sign up at the exact same time with the same email, then it's possible for a sequence of events like this to happen:
- User #1 SELECT operation
- User #2 SELECT operation
- User #1 INSERT operation
- User #2 INSERT operation
Since the SELECT
operations happen for both users before the INSERT
happens for either one, both users will survive the duplicate-check and proceed to the INSERT
operation. The nature of asynchronous programming allows such sequences of events.
Fortunately this problem is really easy to fix, we just need to add a UNIQUE
constraint to the email
column in the user
table. We can do this when creating the table:
CREATE TABLE "user" (
"email" VARCHAR UNIQUE NOT NULL,
"hashedPassword" NOT NULL
);
Or later, by using ALTER TABLE
.
By setting a UNIQUE
constraint on the email
column, we are telling the database that no two rows may have the same value for email
, and the database itself will enforce this for us. Even if the two INSERT
s happen at exactly the same time, concurrency features in the database guarantee that only one will succeed and the other will error.
Case 2: Compound Uniqueness
Let's say we're running a multi-user blogging application, like Dev.to, and we want to allow users to create one highlighted post per week. Our posts
table might look like this:
CREATE TABLE "posts" (
"userId" INT NOT NULL,
"createdAt" TIMESTAMP WITHOUT TIMEZONE NOT NULL,
"highlighted" BOOLEAN NOT NULL,
"postContent" TEXT NOT NULL
);
We could write code similar to the first example, with a uniqueness check like this:
const existingWeeklyHighlightedPostResult = await pool.query(
`
SELECT 1
FROM "posts"
WHERE
"userId" = ?
AND
"highlighted" IS TRUE
AND
/* "createdAt" is greater than the first moment
* of the current week */
"createdAt" >= DATE_TRUNC('week', NOW());
`,
[userId],
);
if (existingWeeklyHighlightedPostResult.rows.length > 0) {
throw new Error('You already have a highlighted post this week');
}
However this would have the same issue. If a user submits two highlighted posts at the same time, both of them might pass this uniqueness check. Maybe the user accidentally double-clicked the submit button, or maybe they're trying to exploit this to get more ad revenue 🤷
Like last time, can we create a UNIQUE
constraint to help us? Yes! Even though we store an exact timestamp rather than the week that a post was created in, PostgreSQL's support of indexes on expressions gives us what we need. Further, we need to use the partial index feature to only enforce this constraint on highlighted posts:
CREATE UNIQUE INDEX "one_highlighted_post_per_week_constraint"
ON "posts" ("userId", DATE_TRUNC('week', "createdAt"))
WHERE "highlighted" IS TRUE;
With this UNIQUE index, the database will not not allow two rows to both have "highlighted" IS TRUE
and to have the same combination of userId
, DATE_TRUNC('week', "createdAt")
. In other words, a user may only have one highlighted post per week.
For any rows where "highlighted" IS FALSE
, they are exempt from this constraint and we can insert as many of them as we want.
Case 3: Compound Multiple Uniqueness
Same case as above, but rather than one highlighted post per week, we want to allow users to make three highlighted posts per week. Can we do this with a UNIQUE constraint like we did above?
Once again, yes, but the solution here may be a bit more convoluted. First, we create the same posts
table, but instead of a BOOLEAN highlighted
column we add an INT weeklyHighlightedPostNumber
column:
CREATE TABLE "posts" (
"userId" INT NOT NULL,
"createdAt" TIMESTAMP WITHOUT TIME ZONE NOT NULL,
"weeklyHighlightedPostNumber" INT,
"postContent" TEXT NOT NULL
);
If a post is highlighted, then its "weeklyHighlightedPostNumber"
will be an integer. If the post is not highlighted, "weeklyHighlightedPostNumber"
will be NULL.
Now we add a constraint that forces weeklyHighlightedPostNumber
to be a number between 1
and 3
, if it's not NULL:
ALTER TABLE "posts" ADD CONSTRAINT num_weekly_posts_constraint CHECK
(
("weeklyHighlightedPostNumber" IS NULL)
OR
("weeklyHighlightedPostNumber" BETWEEN 1 AND 3)
);
Now we can add a UNIQUE
constraint:
CREATE UNIQUE INDEX "three_highlighted_posts_per_week_constraint"
ON "posts" (
"userId",
DATE_TRUNC('week', "createdAt"),
"weeklyHighlightedPostNumber"
)
WHERE "weeklyHighlightedPostNumber" IS NOT NULL;
This will enforce that for any highlighted posts (rows with non-NULL "weeklyHighlightedPostNumber"
), they may not have the same combination of "userId"
, DATE_TRUNC('week', "createdAt")
, and "weeklyHighlightedPostNumber"
. Since the earlier constraint requires "weeklyHighlightedPostNumber"
to be between 1 and 3, this limits us to 3 highlighted posts per week per user.
This does mean that when inserting a post, you need to figure out the next available post number. We can do this with a bit of SQL in our INSERT operation. This solution also handles gaps (for example if you have three highlighted posts and delete the second one). It does start getting a little hairy, but check it out:
async function createHighlightedPost(userId, content) {
const insertResult = await pool.query(
`
WITH next_highlighted_post_num AS MATERIALIZED (
SELECT series_num
FROM GENERATE_SERIES(1, 3) AS series_num
WHERE NOT EXISTS (
SELECT *
FROM posts
WHERE
posts."userId" = $1
AND
DATE_TRUNC('week', NOW()) <= posts."createdAt"
AND
posts."weeklyHighlightedPostNumber" = series_num
)
LIMIT 1
)
INSERT INTO posts
SELECT $1, NOW(), series_num, $2
FROM next_highlighted_post_num;
`,
[userId, content],
);
if (insertResult.rowCount === 0) {
throw new Error('Could not create highlighted post');
}
}
Of course you could instead just write a simple SELECT
query to get all currently existing "weeklyHighlightedPostNumber"
s for the given user and current week, and write JavaScript code to choose the new row's "weeklyHighlightedPostNumber"
to INSERT
.
Case 4: Compound Referential Uniqueness
This case will be similar to case #2, but instead of "one highlighted post per week", we'll modify the condition slightly to "must wait 7 days before making another highlighted post".
In case #2, if a user makes a highlighted post on a Wednesday, they're next able to make one at the start of the next week (Monday).
But here in case #4, if a user makes a highlighted post on a Wednesday, they have to wait until the same time next Wednesday before they can make another.
This would require any constraint to reference the user's previous highlighted post creation date, which is something that typical UNIQUE
constraints simply cannot do. To solve this we'll need to bust out some extra features: transactions and advisory locks.
This will be the final and trickiest case that we look at.
We will use the same table schema as in case #2:
CREATE TABLE "posts" (
"userId" INT NOT NULL,
"createdAt" TIMESTAMP WITHOUT TIMEZONE NOT NULL,
"highlighted" BOOLEAN NOT NULL,
"postContent" TEXT NOT NULL
);
And here is a solution in JavaScript:
async function createHighlightedPost(userId, content) {
const postInsertLockNamespace = 5000;
const queryRunner = await pool.connect();
try {
await queryRunner.query('BEGIN TRANSACTION;');
await queryRunner.query(
'SELECT PG_ADVISORY_XACT_LOCK(?, ?);',
[postInsertLockNamespace, userId],
);
const existingWeeklyHighlightedPosts = await pool.query(
`
SELECT 1
FROM posts
WHERE
"userId" = ?
AND
highlighted IS TRUE
AND
"createdAt" >= NOW() - INTERVAL '1 week'
`,
[userId],
);
if (existingWeeklyHighlightedPosts.rows.length > 0) {
throw new Error(
'Already have a highlighted post in the previous seven days'
);
}
await queryRunner.query(
'INSERT INTO "posts" VALUES (?, ?, ?, ?);',
[userId, new Date(), true, content],
);
await queryRunner.query('COMMIT');
} catch (err) {
await queryRunner.query('ROLLBACK');
throw err;
} finally {
queryRunner.release();
}
}
For this to work, anything which is going to modify (INSERT
, UPDATE
, or DELETE
) records in the posts
table should first SELECT PG_ADVISORY_XACT_LOCK(5000, userId);
. This takes a lock, and any other transaction that tries to take a lock with the same arguments will have to wait for it to be released. If we're careful to always take this lock before modifying posts
, we can trust that after we take the lock, nothing else will modify posts for that user until after we release the lock. This means after we do the SELECT
statement to get existingWeeklyHighlightedPosts
, we know that the result of that will remain correct until after our INSERT
completes and the transaction commits. This effectively prevents a user from submitting a highlighted post if they have already submitted one in the past seven days, even if they are spamming us with parallel post requests.
However, it can be hard to make sure that your code is well-behaved and always takes the lock before modifying posts
, especially if other developers are working on the same codebase (or some psycho (totally not yourself of course) logs in with pgAdmin and runs random queries!). If anyone does not properly take the lock before inserting, then this approach breaks.
To help a little bit, we can create a trigger that automatically takes the lock whenever you INSERT
or UPDATE
or DELETE
rows in this table. That would look like this:
CREATE FUNCTION "take_post_modify_lock_function"()
RETURNS TRIGGER
LANGUAGE PLPGSQL
AS $$
BEGIN
IF OLD."userId" IS NOT NULL THEN
PERFORM PG_ADVISORY_XACT_LOCK(5000, OLD."userId");
END IF;
IF NEW."userId" IS NOT NULL THEN
PERFORM PG_ADVISORY_XACT_LOCK(5000, NEW."userId");
END IF;
RETURN NEW;
END;
$$;
CREATE TRIGGER "take_post_modify_lock_trigger"
BEFORE INSERT OR UPDATE OR DELETE ON "posts"
FOR EACH ROW
EXECUTE PROCEDURE "take_post_modify_lock_function"();
This trigger will force any modification of posts
to take the lock, so you don't need to remember to do it in code when you modify posts.
You do still need to do it in code before SELECT
ing to verify your constraint condition though. There is no such thing as SELECT
triggers, and even if there were, taking the lock would be wasteful for queries that don't really need to take it (like when you're fetching a list of posts to display on the homepage). If you don't take the lock before SELECT
ing, then someone else might get the lock and do an INSERT
before you do the INSERT
you're planning to do based on the results of the SELECT
.
‡ The 5000
chosen above for the first argument to PG_ADVISORY_XACT_LOCK()
is arbitrary. We could have chosen any other number. What's important is that it should be different from any other types of locks you create, so that locks with different meanings don't overlap. For example if we had another table comments
and wanted to do a similar thing with locks, we could use 5001
for it.
SERIALIZABLE Transaction Isolation Level
There's actually some secret alternative magic sauce that can make this all work without any explicit locking, and that's the SERIALIZABLE transaction isolation level. Most SQL databases have this, but it's often a bit different between them. For example PostgreSQL's version provides much stronger guarantees than MySQL's version (which can be a little deceptive).
When you use the SERIALIZABLE transaction isolation level and you do a SELECT inside of a transaction, PostgreSQL "remembers" what you SELECTed, and if any data changes before the transaction finishes in a way that would cause your SELECT query to have returned a different result, then your transaction will get a "serialization failure" and have to ROLLBACK.
I cannot stress enough that this is an extremely powerful feature and if you simply switch it on (as the default: ALTER DATABASE <DATABASE NAME> SET DEFAULT_TRANSACTION_ISOLATION TO SERIALIZABLE;
) and consistently use transactions, then you can write your code without having to think about explicit locking. However there are a number of guidelines and pitfalls you have to be aware of:
- Use explicit transactions (
BEGIN TRANSACTION;
statement) for any operations you do that execute more than one SQL statement (such as thecreateHighlightedPost()
JavaScript function we looked at above). - Avoid running any transactions with lower isolation levels, and be very careful if you do, since it can be harder to reason about how transactions with different isolation levels interact.
- Be ready to retry transactions, since you may encounter serialization failures pretty regularly, which is normal and expected.
- Keep transactions as short as possible, since having more transactions running at any given time will increase the chance of serialization failures.
- Understand that the predicate locking used by SERIALIZABLE transactions does introduce some non-trivial overhead.
- Be aware that certain patterns don't work well under SERIALIZABLE transaction isolation level. For example using
SELECT ... FOR UPDATE SKIP LOCKED
to implement a highly concurrent queue will not work (you'll just get bombarded with serialization failures). - Make sure you have indexes that support your queries, since transactions that have to do full table scans can greatly increase the amount of serialization failures.
Conclusion
Sometimes having to worry about race conditions is a "good problem to have", since it suggests that you have a lot of users using your system. Hopefully they're paying you!
But as the user base grows, race conditions can cause more and more mysterious and crippling faults, and can sometimes even be caused on purpose by exploiters.
I hope this article gave you some new techniques to deal with these cases, and if you know of any other good ones, please share in the comments!
While writing this, I had a billion ideas for additional cases, footnotes, and asterisks, but couldn't include it all without making quite a mess. If you have any questions or are interested in additional cases, feel free to comment!
Posted on February 17, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.