Query optimisation using database indexes

jackmarchant

Jack Marchant

Posted on February 29, 2024

Query optimisation using database indexes

Originally published on jackmarchant.com

A common question in software engineering interviews is how can you speed up a slow query? In this post I want to explain one answer to this question, which is: to add an index to the table the query is performed on.

What is an index in a relational database?

An index in a relational database is a key-value mapping for one or many columns where the key is the data in the column and the value is the primary ID of the row that contains the data.
A primary index also exists in every database table so querying by ID is always fast. A custom index is a reverse-lookup to that primary index.

How does an index speed up database queries?

An index tells the database which rows contain specific values, without having to scan each row individually.

A common way to understand it is the index of a phone book.
If I was trying to find someone with the last name "Martin" in a phone book, I would skip to the back pages to the index, find names starting with M and start looking from the referenced page number.

A database does the same lookup with an index.

Let's take a look at a more concrete example. Suppose we create a new table:

CREATE TABLE `users` (
  `id`          bigint          NOT NULL AUTO_INCREMENT PRIMARY KEY,
  `name`        varchar(255)    NOT NULL,
  `status`      int             NOT NULL,
  `joined_on`   datetime        NOT NULL
);
Enter fullscreen mode Exit fullscreen mode

A query to find the users where status is 1 would result in a full table scan.

explain select * from users where status = 1;
> ... | Extra
        Using where
Enter fullscreen mode Exit fullscreen mode

If we add an index to the status column because we know it's a common access pattern for our application:

ALTER TABLE users ADD INDEX status(status);
Enter fullscreen mode Exit fullscreen mode

When we run the explain again, we can see it

explain select * from users where status = 1;
> ... | key     | .. | Extra
        status  | .. | Using index
Enter fullscreen mode Exit fullscreen mode

When the database performs the operations for this query it will use the index instead of scanning every row, which starts making a big difference when there are millions of rows to scan.

Handling complex queries with a composite index

Continuing with this example, let's assume we have another access pattern which is to find all the users with a specific status who joined after a certain date ordered from most to least recent.

explain select * from users where status = 1 and joined_on >= '2024-02-24' order by joined_on desc;
> ... | key     | .. | Extra
        status  | .. | Using where; Using filesort
Enter fullscreen mode Exit fullscreen mode

Without an index on joined_on column the query could still benefit from the index we added on status. It may not be the best performance, however, with the addition of the joined_on filter and the sort, which would result in a filesort operation which could make overall performance worse.

We could go ahead and create an index for joined_on but the database may still choose the status index and perform a filesort.

What would have better performance is a composite index with both status and joined_on.

ALTER TABLE users ADD INDEX status_joined_on(status, joined_on);
Enter fullscreen mode Exit fullscreen mode

After adding the index, this is what the explain looks like:

explain select * from users where status = 1 and joined_on >= '2024-02-24' order by joined_on desc;
> ... | key                 | .. | Extra
          status_joined_on  | .. | Using index condition; Backward index scan
Enter fullscreen mode Exit fullscreen mode

An index can be stored in either ascending or descending order depending on the definition. We see Backward index scan because we need the reverse order (descending) to sort results for the query above.

If we were to create the index where joined_on column is sorted in descending order we would see the Backward index scan removed:

status_joined_on(status, joined_on DESC)
Enter fullscreen mode Exit fullscreen mode

Now we can run the explain again:

explain select * from users where status = 1 and joined_on >= '2024-02-24' order by joined_on desc;
Enter fullscreen mode Exit fullscreen mode

This is an ideal index for this type of query.

Summary

We explored creating indexes on relational databases and evaluated performance at each step. What did we learn along the way?

  • An index in a relational database is a key-value mapping for one or many columns to tell the database which rows contain what values without having to scan each row.

  • Indexes can speed up query performance at the cost of write performance, though the former typically outweighs the latter.

  • For complex queries, it's possible to create a multi-column index. Ordering the columns is an important factor in its performance.

  • A descending index can help with searches for most recent data.

💖 💪 🙅 🚩
jackmarchant
Jack Marchant

Posted on February 29, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related