Database constraints considered harmful?

jonlauridsen

Jon Lauridsen

Posted on April 13, 2022

Database constraints considered harmful?

In relational databases it's common practice to create constraints to lock down the data and its relationship to other tables. So, a Users table might have an email column that’s marked as unique because everyone has their own email, right? And a Subscription table gets a non-nullable User column because every Subscription must be associated with a User. Yeah, that also sounds reasonable.

But I'm here to argue against constraints! To some that might sound heretical (or at least very stupid), but hear me out:

Almost all constraints exist for two reasons: For performance (it's a lot quicker to search a uniquely-indexed column), and to guarantee data integrity (so we physically can’t enter invalid data, e.g. the business requires a subscription to be tied to a user).

Here's the problem: We programmers often optimize too early, and constraints don't actually express business requirements!

Let me break that down a bit:

  • On the topic of performance, there are of course valid cases where constraints speed up queries by several magnitudes, but unless we see that performance problem and its subsequent remediation with a constraint we run the risk of prematurely optimizing. Fewer constraints would make the code more malleable, and create less friction for making subsequent changes to the database structure.

    Instead defer database-optimizations until they are needed, to counteract our tendency for premature optimizations.

  • Constraints due to data integrity are more directly harmful, but it's also the hardest point to convey to programmers: No amount of unique indices, non-nullable fields, or even stored procedures can or should model the full set of business requirements. There's a reason we don't program in database-languages; we program in general-purpose languages such as C#, Python, Javascript, etc.

    Ultimately business requirements must be expressed in the programming-layer, but they are not required in the storage-layer. And so we must ask ourselves: Why partially replicate those requirements in the DB?

At this point we need to discuss the drawbacks of constraints, because they don't come for free. In fact the more constraints the more difficult a system becomes to test and the less malleable it becomes:

  • Constraints couple concepts together, making it necessary to arrange multiple sub-entities in order to create the one valid entity we want to test. I.e. we can't just create a Subscription, we must also now create a User. And if our test doesn't deal with users that's just a waste of complexity. Then we might start hiding that complexity in entity factories, but those are just abstractions on top of self-inflicted complexity. Why not get rid of the complexity entirely?
  • And those constraints make it a lot harder to adjust to new requirements: What if we actually want subscriptions without a user? What if an email column doesn't have to be unique? If the data-storage layer imposes artificial constraints it dramatically slows down experiments: Instead of just changing some code, that experiment now requires much more complex and dangerous migrations.

For some, what I'm writing here is heresy or deeply stupid or both. But think about it: Constraints are business requirements that almost-kinda-sorta duplicates what the code already does. Instead, imagine if we have perfect trust in the code, what advantage do constraints actually add? What does it really give us to mark a Subscription's User column as non-nullable, when the code already ensures a Subscription is only ever created in relation to a User? It actually amounts to no help at all because constraints are simply not the source of truth.

Instead, try not adding constraints, and focus on ensuring great clarity and trust in the code. I know this can sound extreme, but it really isn't that bad.

For me personally it's even an open question whether foreign keys are worth it, unless we're clearly entering into performance problem territory. There's absolutely nothing wrong with code making multiple database calls and merging datasets in memory, as long as performance isn't impacted.

I think many of us create constraints because it feels neat that way, it feels… optimal? And it's how we're trained to do it. But none of us are realistically implementing 100% of business rules in the database, so, where do we draw the line? Why do we only ever add the easy constraints, but not fully map out the actual requirements?

Instead let the code express the business requirements, and let the database focus on storing and retrieving data.

Photo by Sam Moqadam on Unsplash

💖 💪 🙅 🚩
jonlauridsen
Jon Lauridsen

Posted on April 13, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related