PGzlan
Posted on August 6, 2023
Table of Contents:
- Introduction
- UPSERT in PostgreSQL
- Using UPSERT in PostgreSQL
- Method 1: INSERT ... ON CONFLICT
- Method 2: MERGE (UPSERT)
- Choosing the Right Method
- Conclusion
- References
Introduction
In the world of databases, updating or inserting data can be a recurring, and complex task. This is especially apparent when dealing with conflicting data or unique constraints. However, Postgres provides a powerful feature called UPSERT.
UPSERT in PostgreSQL
The term upsert is actually the combination of the two words update and in*sert*. When a new row is inserted _into a table, PostgreSQL will perform an _update to the row, provided that it has found an entry for it. If not, then the new row is inserted.
This greatly eases the process of constructing queries, enabling the user to perform updates and inserts using a just one operation.
Using UPSERT in PostgreSQL
To use the upsert feature in PostgreSQL, you can use the INSERT ON CONFLICT
statement as follows:
INSERT INTO table_name (column1, column2, ...)
VALUES (value1, value2, ...)
ON CONFLICT (constraint_column)
DO action
The ON CONFLICT target action
clause was added to the INSERT
statement to support the upsert feature.
(Note #1: This is not unique to PostgreSQL. Other RDBMS support it, though the syntax might be different. For example, MySQL supports upserts using the INSERT ... ON DUPLICATE KEY UPDATE statement. Similarly, SQL Server supports upserts using the MERGE statement)
(Note #2: that the ON CONFLICT
clause has been introduced from PostgreSQL 9.5)
In an **upsert **statement in PostgreSQL, the DO
clause specifies the action to take when a conflict occurs during the INSERT
operation. There are two actions that can be specified in the DO
clause of an upsert statement in PostgreSQL:
-
DO NOTHING
: This action specifies that no changes should be made to the table if a conflict occurs. In other words, if a row already exists with the same values as the row being inserted, theINSERT
operation will be skipped and no changes will be made to the table. -
DO UPDATE
: This action specifies that the existing row should be updated if a conflict occurs. You can use theSET
clause to specify which columns should be updated and what their new values should be. TheWHERE
clause can also be used to specify additional conditions for the update.
Another way of performing upserts in Postgres is by using MERGE
(also known as UPSERT
in some other databases).
(Note #3: that the MERGE
statement has been introduced from PostgreSQL 15)
Method 1: INSERT ... ON CONFLICT
The INSERT ... ON CONFLICT
statement allows you to specify a conflict resolution action when a unique constraint violation occurs during an insert operation. Here's the basic syntax:
INSERT INTO table_name (column1, column2, ...)
VALUES (value1, value2, ...)
ON CONFLICT (constraint_column)
DO UPDATE SET column1 = value1, column2 = value2, ...
We'll use a simple example to illustrate this. Let's say we have a users
table with columns id
, name
, and email
, where id
is the primary key. We want to insert a new user or update the name and email if the user already exists.
-- Create the users table
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100)
);
-- Insert or update a user
INSERT INTO users (id, name, email)
VALUES (1, 'John Doe', 'john@upsertdemo.com')
ON CONFLICT (id)
DO UPDATE SET name = EXCLUDED.name, email = EXCLUDED.email;
In the above example, if a user with id
1 already exists, the name
and email
will be updated with the new values. Otherwise, a new row will be inserted.
But wait. Where did this EXCLUDED
come from?
In an upsert statement in PostgreSQL, EXCLUDED is a special table that is used to reference the values that were proposed for insertion in the INSERT statement.
The EXCLUDED
table has the same columns as the table being inserted into, and its values are the values that would have been inserted if the INSERT statement had not encountered a conflict.
The EXCLUDED
table is not a physical table that is stored on disk but rather a virtual table that is created and used during the execution of an INSERT ... ON CONFLICT
statement in PostgreSQL and exists only in memory during the execution of the INSERT ... ON CONFLICT
statement.
Method 2: MERGE (UPSERT)
The MERGE
statement allows you to perform both update and insert operations based on a specified condition.
Here's the basic syntax of the MERGE
statement:
MERGE INTO target_table AS target
USING source_table AS source
ON (condition)
WHEN MATCHED THEN
UPDATE SET column1 = value1, column2 = value2, ...
WHEN NOT MATCHED THEN
INSERT (column1, column2, ...)
VALUES (value1, value2, ...)
Let's make use of the same example as before and showcase the MERGE
statement:
-- Create the users table
CREATE TABLE users (
...
);
-- Merge (UPSERT) a user
MERGE INTO users AS target
USING (VALUES (1, 'John Doe', 'john@upsertdemo.com')) AS source (id, name, email)
ON (target.id = source.id)
WHEN MATCHED THEN
UPDATE SET name = source.name, email = source.email
WHEN NOT MATCHED THEN
INSERT (id, name, email)
VALUES (source.id, source.name, source.email);
In the above example, the MERGE
statement first checks if a user with id
1 exists in the users
table. If it does, the name
and email
will be updated with the new values. Otherwise, a new row will be inserted.
Choosing the Right Method
Both INSERT ... ON CONFLICT
and MERGE
provide UPSERT functionality, but their usage depends on your specific requirements and the version of Postgres you are using. If you are working with Postgres versions prior to 9.5, INSERT ... ON CONFLICT
is not available, and you should use alternative approaches like UPDATE
followed by INSERT
in separate statements.
If you have Postgres 9.5 or greater (but not 15), You can leverage the power of INSERT ... ON CONFLICT
. If you are on Postgres 15 or higher, then the world is your oyster.
The choice between them depends on factors such as performance, complexity, and personal preference. It's recommended to benchmark and compare the performance of both methods with your specific workload and dataset to make an informed decision.
Conclusion
Updating or inserting data in a database can be a cumbersome task, especially when dealing with conflicting data. However, Postgres simplifies this process with its UPSERT functionality. In this blog post, we explored two methods to achieve UPSERT in Postgres: INSERT ... ON CONFLICT
and MERGE
.
We learned that INSERT ... ON CONFLICT
allows you to specify a conflict resolution action when a unique constraint violation occurs during an insert operation. This method is suitable for Postgres versions prior to 9.5. On the other hand, MERGE
(UPSERT) is introduced in Postgres 9.5 and provides a more concise and flexible syntax for performing both update and insert operations based on a specified condition.
Choosing the right method depends on your specific requirements and the version of Postgres you are using. It's important to consider factors such as performance, complexity, and personal preference when making the decision.
With the UPSERT functionality in Postgres, you can simplify your database operations, ensure data consistency, and streamline your application logic. Whether you choose INSERT ... ON CONFLICT
or MERGE
, UPSERT empowers you to efficiently handle conflicting data and keep your database up to date.
Happy UPSERTing!
References
UPSERT - PostgreSQL wiki.
Commit Fest
PostgreSQL: Documentation: 15: MERGE.
SQL MERGE - PostgreSQL wiki
postgresql - Difference between UPSERT and MERGE
PostgreSQL Merge | Quick Glance on PostgreSQL Merge - EDUCBA.
[How can I return EXCLUDED from WITH in PostgreSQL?.](
Posted on August 6, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 29, 2024
November 29, 2024
November 29, 2024
November 28, 2024