A Complete Guide to Generated Columns in MySQL
Arctype Team
Posted on June 28, 2022
MySQL generated columns pose as a powerful, easy-to-use, and advanced tool for anyone who wants to add automatically generated data to their tables - in this blog, we will learn everything you need to know to master them.
Generated columns allow you to store automatically generated data in a table without using the INSERT
and UPDATE
clauses. This useful feature has been part of MySQL since version 5.7, and it represents an alternative approach to triggers when it comes to generating data. Also, generated columns can help you make your query easier and more efficient.
Let’s now learn everything you need to know to master generated columns in MySQL.
What is a MySQL Generated Column?
A generated column is similar to a normal column, but you cannot change its value manually. This is because an expression defines how to generate the value of a generated column based on the other values that are read from the other columns of the same row. So, a generated column works within the domain of a table, and its definition cannot involve JOIN
statements.
In other words, you can think of a generated column as a sort of view but limited to columns. Notice that generated columns are different from SQL triggers, and you can define them only when using CREATE TABLE
orALTER TABLE
statements with the syntax below:
generate_column_name column_type [GENERATED ALWAYS] AS (generation_expression)
[VIRTUAL | STORED] [NOT NULL | NULL]
[UNIQUE [KEY]] [[PRIMARY] KEY][COMMENT *string*]
The AS (generated_column_expression)
clause specifies that the column you are adding or updating to a table is a generated column. The generation_expression
defines the expression that MySQL will use to compute the column values, and it cannot reference another generated column or anything other than the columns of the current table. Also, notice that the generation expression can only involve immutable functions. For example, you cannot use the CURDATE()
function that returns the current date in a generated column expression definition because it is a mutable function.
You can also precede AS
with the GENERATED ALWAYS
keywords to make the generated nature of the column more explicit, but this is optional. Then, you can indicate whether the type of the generated column is VIRTUAL
or STORED
. You will learn the difference between the two types in the chapter below. By default, if not explicitly specified in the query, MySQL marks a generated column as VIRTUAL
.
Let’s now see the generated column syntax in action in a CREATE TABLE
query:
CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
email VARCHAR(100) NOT NULL,
first_name VARCHAR(60) NOT NULL,
last_name VARCHAR(60) NOT NULL,
full_name varchar(120) GENERATED ALWAYS AS (CONCAT(first_name, ' ', last_name))
);
In this example, the full_name
column will automatically store the concatenation of the first_name
and last_name
columns.
What Types of Generated Columns Exist in MySQL?
As explained earlier, you can define a generated column as VIRTUAL or STORED. Let’s now take a closer look at these two types.
Virtual generated columns
MySQL does not store a generated column marked as VIRTUAL. This means that MySQL evaluates its value on the fly when required. This typically happens immediately after any BEFORE
query is triggered. In other terms, a virtual generated column takes up no storage space.
Stored generated columns
MySQL stores any generated column marked as STORED. This means that MySQL takes care of evaluating its value and storing it on the disk every time you insert or update a row. In other terms, a stored column requires storage space as if it was a normal column.
Virtual generated columns vs Stored generated columns
Let's now learn more about the pros and cons of virtual and stored generated columns.
Virtual generated columns
Pros
- Their creation is instantaneous because MySQL only has to change the table metadata.
- They require no disk space.
-
INSERT
andUPDATE
queries come with no overhead because MySQL does not need to generate them.
Cons
- MySQL has to evaluate them when reading a table, making
SELECT
queries involving them slower.
Stored Generated Columns
Pros
- MySQL can read them as if they were normal columns, which assures fast retrieval with no overhead.
Cons
- When added to a new table, MySQL has to rebuild the entire table.
-
INSERT
orUPDATE
comes with an overhead because MySQL has to generate them. - They require disk space. Also, notice that you can mix VIRTUAL and STORED columns within a table, and they both support MySQL indexes and secondary indexes. But as explained in the official documentation, secondary indexes on virtual columns take up less space and memory compared to stored generated columns. So, virtual generated columns are more efficient when it comes to secondary indexes.
Why Adopt Generated Columns?
There are several reasons to adopt generated columns, but the three below are the most important ones.
- They provide you with cache capabilities to make your queries faster: generated columns give you the possibility to define columns containing useful data so you can then efficiently retrieve it whenever there is a need.
- They allow you to simplify query expressions: instead of making your queries complex, you can spread the complexity over the generated columns and then use them in simple filter operations.
- They empower you to define functional indexes: MySQL implements functional indexes as hidden virtual generated columns. In other terms, generated columns give you the possibility to define efficient and advanced indexes involving MySQL functions.
MySQL Generated Columns in Action in Real-world Examples
Let’s now see generated columns in action in some examples coming from my experience as a backend developer collaborating with data-driven startups.
Using a generated column to concatenate columns for consistency reasons
When developing the frontend of your application, you may notice certain patterns in data representation. For example, in sports, players on a team are generally identified with the following format:
first_name [first_carachter_middle_name.] last_name [(jersey_number)]
As you can see, you can easily generate this data field by aggregating the four columns with the following generated column:
string_identifier VARCHAR(255) GENERATED ALWAYS AS
(CONCAT(first_name, ' ', IF(middle_name IS NULL, '', CONCAT(LEFT(middle_name, 1), '. ')), last_name, IF(jersey_number IS NULL, '', CONCAT(' (', jersey_number, ')'))))
STORED
This would produce:
Cristiano Ronaldo (7)
Lionel A. Messi (10)
In this case, a generated column empowers you to standardize the data field format directly at the database level. In addition, a stored generated column avoids the inevitable overhead of constructing this field every time you need it.
Employing a generated column to automatically generate hashed IDs
Typically, you use the IDs of your resources in the URL of your website or REST APIs to retrieve the data you need. But publicly exposing your IDs could pose a security problem. That is especially true when you find yourself using autoincremental IDs, which are easy to predict and make scraping or bot attacks easier.
To avoid this, you can think about hiding your original IDs through the use of automatically generated, random, more secure public IDs. You can achieve this with a virtual generated column by hashing your IDs as follows:
public_id VARCHAR(40) GENERATED ALWAYS AS SHA1(CONCAT("PLAYER", id)) VIRTUAL
Notice that to avoid generating known hash values, you can concatenate your ID with a special keyword. Learn more about MySQL encryption and compression functions here.
Defining a Generated Column to Simplify Data Filtering
When filtering data, there are columns that are more useful than others. Also, you often have to change the representation of the values stored in the columns to make filtering simpler or more intuitive. Instead of doing this in every filter query, you can define a helpful generated column storing the information you need to perform the filtering in the desired format.
For example, you could define a generated column to more easily find the players on a basketball team as follows:
filter_string VARCHAR(255) GENERATED ALWAYS AS
(CONCAT(first_name, ' ', last_name, IF(birthdate IS NULL, '', CONCAT(' ', DATE_FORMAT(birthdate, "%m-%d-%Y"))))
STORED
Such a column would produce:
LeBron James 12-30-1984
Stephen Curry 03-14-1988
This contains data useful for filtering and with the player's birthdate in the US format.
Generated columns vs. Triggers
As explained previously, you can only use generated columns within a table. Also, they can only involve immutable functions, and MySQL generates their values in response to an INSERT
or UPDATE
query. On the other hand, a trigger is a stored program that MySQL automatically executes whenever an INSERT
, UPDATE
or DELETE
event associated with a particular table occurs. In other terms, triggers can involve several tables and all MySQL functions. This makes them a more complete solution compared to generated columns. At the same time, MySQL triggers are inherently more complex to use and define and also slower than generated columns.
Final Thoughts
MySQL generated columns are undoubtedly a great tool that can serve many uses and bring several benefits. Adopting them is easy and effective, although not many database developers use them. Either way, knowing how to take advantage of them may become essential to making your backend application faster, cleaner, and less complex. I hope that by explaining the use cases of generated columns through real-world examples I was able to help you gain more knowledge in the database world, and until next time.
Posted on June 28, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.