Solutions for storing state in a database

danielbenzie

Daniel

Posted on April 19, 2022

Solutions for storing state in a database

When designing an application, you will often need to model 'state'. State, in this sense, is related directly to a single entity. An example may be:

I have an order, and the possible status' of the order can be:

  • Pending
  • Paid
  • Dispatched
  • Delivered
  • Cancelled
  • Refunded

The order may only be associated with a single state at any time. It would not make logical sense for an order to both be delivered and returned.

Transitioning between these states will drive the behaviour of your service. For example, when an order transitions to dispatched, it is an ideal time to notify the customer of the delivery tracking details.

When modelling this type of solution, engineers are often tempted by the simplicity of having a single column on the original entities table to describe the current state. This can be achieved in a few different ways.


First Solution (and the worst)

As we just discussed, the easiest solution is to add a column to the Orders table. This column can store the current state of order.

It can be achieved by storing the state as a string:

CREATE TABLE `orders` (
 `id` INT unsigned NOT NULL AUTO_INCREMENT,
 `customer_id` INT unsigned NOT NULL,
 `order_value` INT NOT NULL,
 `state` VARCHAR NOT NULL,
 `created_at` DATE NOT NULL,
 `updated_at` DATE NOT NULL,
 PRIMARY KEY (`id`)
);
Enter fullscreen mode Exit fullscreen mode

This isn't good for a lot of reasons. The most obvious one is that the application can submit any string to the state column, even one that has not been designed for in the application. This can break the application and cause problems for the service.

We could solve that issue by making the state column an enum and only allowing expected values.

ALTER TABLE orders ADD state enum('pending','paid',...);

Enter fullscreen mode Exit fullscreen mode

This is still a bad solution.

  • We have to run an ALTER on the table every time we wish to add new states.
  • We cannot store additional metadata about the state as we are limited to a single column with a specific string stored.
  • It can be challenging to disable values. What if the store is no longer offering refunds?
  • ENUM datatype is not standard SQL and is not supported by many ORM.
  • It is complicated if we wish to output the list of possible values for state in our application.

Slightly Better Solution

It is clear from the previous approaches that having a single column denoting the state is not ideal. The next logical solution we could design is creating an order states table and referencing it from the orders table via a foreign key.

CREATE TABLE `order_states` (
 `id` INT NOT NULL AUTO_INCREMENT,
 `key_name` VARCHAR NOT NULL,
 `created_at` DATE NOT NULL,
 `updated_at` DATE NOT NULL,
 PRIMARY KEY (`id`)
);
INSERT INTO order_states (key_name)
  VALUES('pending', 'paid',...);
ALTER TABLE orders 
ADD CONSTRAINT FK_OrderState 
FOREIGN KEY(state_id)
REFERENCES id(order_states);
Enter fullscreen mode Exit fullscreen mode

This solves some of the previous issues. There is no need to run ALTER on the table when adding new states, and any additional metadata that needs to be stored against the state can be added as a new column in the order_states table. If a state needs to be disabled/removed, it is easy to select orders by state_id, and we also get the protection of valid states via foreign key constraints.

Best Solution

The solution we just implemented has solved most of our problems but we have issues with the visibility of an entities state over time. Because the order is assigned a state via a single FK reference it means we only have information about the current state of the order. If the order is currently in a state of Cancelled there is no easy way for me to tell when it was Paid.

We can solve this by adding another pivot table.

CREATE TABLE `order_state_history` (
 `id` INT NOT NULL AUTO_INCREMENT,
 `order_id` INT unsigned NOT NULL,
 `order_state_id` INT NOT NULL,
 PRIMARY KEY (`id`)
);
Enter fullscreen mode Exit fullscreen mode

In addition to creating this table we will remove the state_id column from the Orders table.

This pivot table works as follows: Any time you wish to change the state of an order you INSERT a new record into this table. You never need to UPDATE any rows here. You will also likely never need to DELETE any rows (unless you are purging records).

The current state of an order is the most recent record in this table (associated with the order_id)

You can get the order with its latest state by using the query below:

SELECT orders.*, order_state_history.*
FROM orders O
JOIN order_state_history OSH1 ON (O.id = OSH1.order_id)
LEFT OUTER JOIN order_state_history OSH2 ON (O.id = OSH2.order_id AND (OSH1.created_at < OSH2.created_at OR (OSH1.created_at = OSH2.created_at AND OSH1.id < OSH2.id)))
WHERE OSH2.id IS NULL;
Enter fullscreen mode Exit fullscreen mode

Drawbacks
It would not be objective if I did not at least point out some of the drawbacks of the above solution.

  • Additional pivot table required linking Order to State
  • Selecting data from Orders requires a more complex query (including JOINs)
  • You will end up having to store additional rows for each order. Worst case is Orders * count(possible_state_values)

Summary

The last solution discussed is more robust because it is INSERT only. We do not need to worry about race conditions.

If we receive a stream of updates, we simply write each record in the order we receive it. It also ensures we have a complete historical record of an order and the states it held.

This makes it trivial to look at the data set and ask questions like:

  • How many orders did I dispatch between 1/12/2021 and the 31/12/2022
  • What was the total value of orders refunded during August

These questions would have been impossible to answer with the other solutions because we did not retain the history of state movements - only the current one.

The additional code to be written is minimal, and any performance hit from using JOINs is trivial when using proper indexing.

Any concerns about the additional data stored can easily be rectified by running a simple query to remove unneeded state history items. In pseudocode terms - 

  • Delete all order_state_history items IF created_at > 1 year ago AND
  • order_state_history_id < MAX order_state_history_id for any given record

Running a query like this would remove historical items (more than one-year-old) but retain the latest state for each Order (as this will remain important)

In conclusion, it is a much better and more complete way of managing the state in your application and should be utilised where plausible.

💖 💪 🙅 🚩
danielbenzie
Daniel

Posted on April 19, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related