Should you use an enum column or a table to store allowed values in SQL?

anwar_nairi

Anwar

Posted on October 10, 2023

Should you use an enum column or a table to store allowed values in SQL?

Hello and welcome to this article when I would like to discuss in reaction to a great video released by Aaron (Youtube, Twitter) on Database enums for PlanetScale.

Summary

The use case: invoicing statuses

To setup the stage, let us say you create a platform to put in relation developers and customers. You want to allow developers to generate invoices from their account.

First invoice status version

Our "invoices" table contains a "status" column. The equivalent SQL code for this column would be the following:



CREATE TABLE invoices (
  ...
  status ENUM('draft', 'generated', 'sent', 'paid', 'cancelled')
);


Enter fullscreen mode Exit fullscreen mode

 First problem: editing the name of a status

Let us imagine the naming of "cancelled" is not clear enough, and must be changed for "aborted" to really understand the invoice payment is abandonned.

Let us also imagine you already have a bunch of invoices that are tagged as "cancelled" in your table.

id developer_id number status
1 1 000001 cancelled
2 1 000002 paid
3 1 000003 sent

At this point, you cannot "simply" change the name of the enum, because SQL uses it as an identifier (much like a foreign key) and because of it you are not allowed to alter a value that is already bound to a record in the table.

The best you could do is to:

  1. Add a new value "aborted"
  2. Run a query to move all "cancelled" status to "aborted"
  3. Delete the enum value "cancelled"

Which is a little bit cumbersome for "just" a naming change.

Second problem: ordering

As highlighted by Aaron on the video I mentioned in the introduction of this article, ordering by the status of the invoice will not use a text-based sorting algorithm.

At the creation step, MySQL has associated each values to an integer (starting from 1).

value index
draft 1
generated 2
sent 3
paid 4
cancelled 5

If we would have added the enum values in a different orders, the indexing would be different, which creates an unpredictible outcome when we sort the invoices by their statuses (to some extent).

Third problem: adding "meta data" is harder

Since by design each enums items represents both the value and the identifier, it is harder to add more data around them.

To show this limitation, let us first agree on the possible values an invoices can take before being considered "completed".

Invoice statuses workflow

In practice this means we cannot go from "paid" to "cancelled", they are final statuses.

With an enum value, we cannot really add this information easily. We would either have to add a "final_status" boolean column alongside the "status" enum, which would be a computed virtual column. Again, this seems cumbersome.

The solution: reference tables

Let us recap all the issues we encountered with enums:

  1. The name is not easily editable if it has been bound to an existing record
  2. Ordering is less predictible/not obvious
  3. Adding meta data requires more attention

To solve this issues, one could think of modelizing allowed values by using a reference table instead.

Invoice statuses modelized with a reference table

References tables have the same advantage have enums, meaning you cannot create an invoice with an invoice_status_id that is not part of the reference table: you have the same safety level as enums (as long as you carefully use a foreign key).

If you need to order over the invoice_status_id, it is obvious and clear what will be the expected order. You can even go further and create an order column in the invoice_statuses table if you want to allow back office users to change the order of each allowed values independently.

If you do that, it also shows how easy it is to add "meta data" around your values. As you can see in the diagram, we added the concept of "is_final" pretty easily. Not virtual column needed, just a boolean column that represent if the value is final or not (and no data redundancy, good for Database normalization).

Lastly, if you want to change the name of a status, it is just one step (versus 3 steps if you would have an enum column).

Conclusions

In this article we have seen that enums columns are useful for light/simple need of having a list of allowed values that will not vary in the future.

However, as a future-proof solution, they seem to be rapidly limited and cause a lot of issues when we need something more flexible and evolutive.

As a solution, references tables are a good way to keep all the safety of enums, with a lot of advantages when it comes to add more descriptive / meta data around the values. It is also more easy to edit names and there is a clear separation between the identifier and the name.

References tables are personally my go-to tool when it comes to add references values.

One could say this adds more tables and "complexifies" the modelization, but to me the number of tables is not affecting the performance of queries by a significant factor.

For purists, one last case in favor of references table would be if you allow developers to manually change the status of an invoice (warning: changing the status of an invoice manually is not advised, do not try this at home!).

With an enum column, the name is also the identifier.



<select name="status">
  <option x-for="status in statuses" :value="status">{{ status }}</option>
</select>


Enter fullscreen mode Exit fullscreen mode

The drawback, because the name and the identifier are one, is that if you want to display "Aborted" instead of "Cancelled", you will have to store the change as a data in the application layer, which is another cumbersome step to do. Also think of the translations that you now have to manage...

If you have a clear separation between the identifier and the name, you could easily change the name on the table, and your front code is still the same as below no matter what.



<select name="status">
  <option x-for="status in statuses" :value="status.id">{{ status.name }}</option>
</select>


Enter fullscreen mode Exit fullscreen mode

Now, since status.name comes from your API endpoint, you can do the naming change without changing the status.id, which gives you more flexibility and reduce the maintenance front-side.

I hope this article gave you another point of view when it comes to modelize a list of allowed values in your database. Make the good choice, and happy database modelization!

Cover image by Lukas from Pexels.

💖 💪 🙅 🚩
anwar_nairi
Anwar

Posted on October 10, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related