Alexey
Posted on October 27, 2021
This post is about an interesting problem that relates to crossing the boundary between a SQL database and code; spoiler alert: it has to do with a little known yet broadly relevant security pitfall.
The situation
Suppose you have a user database with extremely sensitive data, and also data that is less sensitive.
You wish to protect your most sensitive data, so you restrict access to it to some database users. Only privileged database credentials - available to a few micro-services and employees at the company - can grant read access to the data.
For example, in a typical SQL database like Postgres, this is possible through GRANT/REVOKE
:
GRANT SELECT ON TABLE secret_user_data TO privileged_user;
REVOKE SELECT ON TABLE secret_user_data FROM basic_user;
GRANT ALL PRIVILEGES ON TABLE basic_user_data TO basic_user;
GRANT ALL PRIVILEGES ON TABLE users TO basic_user;
Now imagine how the primary keys work on those tables: users
has an ID for sure; secret_user_data
and basic_user_data
are 1-to-1 with users via a user_id
foreign key - so do they really need their own primary key?
I would argue no, it would never be useful, because the user_id
column fully determines the row; you'd never need to query on a separate id
field. So these side tables should inherit the primary key from users
.
The problem
Now we're going to be writing this data from a server - be it Node, Sinatra, Flask, or any framework/language whatsoever.
The problem that exposes is that of mass assignment. For example, the Ruby ORM library Sequel goes to some lengths to deal with it - but most popular frameworks are in a similar boat.
Mass assignment can happen when user input is used in write queries. For example, if your query builder takes in maps, and your POST endpoint accepts JSON which is converted to a map, perhaps whatever you put into the map gets written into the database.
A particularly nasty accident can happen if the primary key field (in our case, user_id
) is overriden (imagine something like
db.secret_user_data.update({user_id: currentUser.id}, data)
, where data = {user_id: -5}
). Our side table entries would be orphaned from their users, and our data scrambled.
Solution
Sequel's solution is to "restrict primary keys": throw an exception if a primary key is included in a write. Perhaps your framework doesn't have this type of security measure in place - perhaps you even have the vulnerability somewhere in your codebase! But I'm sure all members of all your teams are responsible about writing only select fields and not the literal user input, even when pressed against deadlines.
In the case I outlined, it's not possible to restrict primary keys. We need to actually write the primary key when we create the entry - but we definitely want the protection against overriding them after they are created.
Luckily, this is easy to achieve at the database level. For example, in Postgres:
CREATE OR REPLACE FUNCTION immutable_id()
RETURNS TRIGGER AS
$BODY$
BEGIN
-- Alternatively, we can just fail silently
-- NEW.id = OLD.id;
IF NEW."id" IS DISTINCT FROM OLD."id"
THEN
RAISE EXCEPTION 'Refusing to update id column';
END IF;
RETURN NEW;
END;
$BODY$
LANGUAGE PLPGSQL;
CREATE TRIGGER secret_user_data_immutable_user_id
BEFORE UPDATE OF "user_id"
ON "secret_user_data"
FOR EACH ROW
EXECUTE PROCEDURE immutable_id();
No restricted primary keys, no mass-assignment ID override vulnerability - win-win! We can have our granular security, minimal id structure, and any syntax we desire w/o fear of scrambling our data.
The rest of the fields should probably still receive application-level care - but luckily that just means there's still a job for good engineers!
Posted on October 27, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.