Soft delete cascade in PostgreSQL🐘 and YugabyteDB🚀

This is a quick example to answer this remark about soft deletes:

I don't know which database Kelly Sommers uses, but PostgreSQL has many features that, combined, help implementing this data processing logic, in a declarative way. I know people don't like to put code in the databases, but this is not business logic. This is pure data logic: implementing soft deletes. SQL has huge benefit for this: it is a declarative language. You declare it once, test it, and you are done. No need for additional code or additional tests.

I've run this on YugabyteDB to verify that it works the same as in PostgreSQL. Of course, no suprise, YugabyteDB re-uses the postgres SQL processing layer, for the best compatibility.

Tables

Here is the parent table which has a parent_deleted timestamp set to the date of deletion. The default, infinity, is for valid records. This column is part of the primary key because there may be multiple deletion for the same parent_id. But only one valid.

CREATE TABLE parent (
 parent_id int, parent_deleted timestamptz default 'infinity',
 primary key (parent_id,parent_deleted)
);

The child table inherits the parent primary key and adds a child_number to it as its primary key. The foreign key is declared with on update cascade as the soft deletes will be cascaded as updates to this primary key.

CREATE TABLE child (
 parent_id int, parent_deleted timestamptz default 'infinity',
 child_number int,
 primary key (parent_id,parent_deleted, child_number),
 foreign key (parent_id,parent_deleted) 
  references parent(parent_id,parent_deleted)
  on update cascade
);

Views

Tables could be sufficient. But the beauty of SQL is the logical independence. I want to query my tables, from the application or by the user, without caring about the soft delete implementation.

I declare views for that. The application will query valid_parent and valid_child to see the current versions, filtering out the soft deleted rows:

create view valid_parent as
 select parent_id from parent where parent_deleted>=now();

create view valid_child as
 select parent_id,child_number from child where parent_deleted>=now();

Thanks to re-using the primary key, there is no need to join the tables there. This is the right choice when deletes are rare (the cascading update overhead is acceptable) but selects are frequent. And people tend to think that joins don't scale.

Procedure

I want to encapsulate this logic in the database and create a procedure to do be called for this soft deletion:

create procedure soft_delete_parent(id int) as $SQL$
update parent
set parent_deleted=now()
where parent_id=id;
$SQL$ language sql;

I'll show an alternative later if you don't like stored procedures. But, personally, I like this procedure encapsulation because the semantic is clear: the application calls a specific procedure.

Data

I'm inserting few rows there. I'm inserting valid rows, and insert them though the view, because a view is a virtual table, with all DML allowed. The default infinity value is set automatically:

insert into valid_parent
 select n from generate_series(1,3) n;

insert into valid_child
 select parent_id,n from valid_parent,generate_series(1,2) n;

Here is a screenshot from my test:

You can easily reproduce it - did you try the YugabyteDB managed free tier?

Test

When you implement data logic in SQL, a simple unit test is usually sufficient, because the database takes care of all multi-user consistency.

select * from valid_parent;

select * from valid_child;

This shows only the valid rows. I call the procedure to soft-delete one parent:

call soft_delete_parent(2);

When querying the views, the rows have been virtually deleted:

select * from valid_parent;

select * from valid_child;

Here is the result:

In the tables behind the views, we can see all the rows, with the soft-deleted ones:

yugabyte=# select * from parent;

 parent_id |        parent_deleted
-----------+-------------------------------
         1 | infinity
         2 | 2022-04-15 10:21:45.635693+00
         3 | infinity
(3 rows)

yugabyte=# select * from child;

 parent_id |        parent_deleted         | child_number
-----------+-------------------------------+--------------
         1 | infinity                      |            1
         1 | infinity                      |            2
         2 | 2022-04-15 10:21:45.635693+00 |            1
         2 | 2022-04-15 10:21:45.635693+00 |            2
         3 | infinity                      |            1
         3 | infinity                      |            2
(6 rows)

Note that, with GRANT and REVOKE, you can give access to the views only, or to these tables. And revoke the right to hard delete.

Rule

You can make this completely transparent, so that users don't have to call the procedure, but simply run DELETE on the view, with a DO INSTEAD code:

create or replace rule soft_delete_parent as 
on delete to valid_parent do instead
update parent
set parent_deleted=now()
where parent_id=old.parent_id;

This is simple. Now any delete will actually do a soft delete:

This looks great, as the application is just interacting with the standard SQL API (SELECT, INSERT, UPDATE, DELETE). And it comes handy when the application cannot be modified. But, for better code quality, I prefer a procedure so that the application developer knows what she does (my procedure name is explicit about soft deletes). You can also see that this RULE is not 100% transparent in its output, showing DELETE 0.

PostgreSQL-compatible

This technique is easy on PostgreSQL and PostgreSQL-compatible databases which re-use the PostgreSQL open-source code, like YugabyteDB. Here is the list of SQL features that makes it easy, declarative, and transparent:

SQL Feature	🐘 PostgreSQL	🚀 YugabyteDB	🪳 CockroachDB	🅾 Oracle
composite PK	✅	✅	✅	✅
default infinity	✅	✅	✅ (4)	❌ (1)
on update cascade	✅	✅	✅	❌ (2)
stored procedure	✅	✅	❌	✅
insert into view	✅	✅	❌	✅
rule / instead of view	✅	✅	❌	✅ (3)
grant/revoke	✅	✅	✅	✅

(1) Temporal Validity is an alternative
(2) Triggers and deferred constraints may be an alternative
(3) Oracle has no equivalent of now() which is the start of transaction
(4) Displayed as 294276-12-31 23:59:59.999999+00

Note that, to be fair, I compared only with the database where I know more than the basics. Here is my guess for SQL Server:

This is where having all PostgreSQL features in YugabyteDB makes it the right solution for many enterprise applications. Even when you don't want to put business logic into the database, there is one day where you will need a stored procedure, triggers, rule, or any of those modern SQL features that have proven their value on monolithic databases and are now available in distributed SQL.

Blog