Improving PostgreSQL queries

elmuerte

Michiel Hendriks

Posted on April 30, 2019

Improving PostgreSQL queries

A while ago I commented that my big win was improving a query which took over an hour to execute to just 30 seconds. In fact, for the client this query went from close to 2 hours to about 2 minutes, which included transferring the millions of rows. I optimized quite a few other queries in the past months.

This article will be mostly about reporting or business intelligence (BI) queries. These queries are often large/complex, reason about a lot of data, and generally take a long time to execute. With long I mean tens of seconds if not minutes or hours.

Some of the things I will explain in this article could also apply to more standard application queries, like search queries. But I cannot be sure about it. This brings me to the most important step.

Explain the query plan

Any serious DBMS has an option to get information about how the DBMS is going to execute the query. This is the query plan. Every time you change anything you should request a query plan to see how it was affected. Changes include changes to the PostgreSQL server settings, and also upgrades of the server. It could very well happen that your optimized query will now perform worse.

In PostgreSQL the query plan can be retrieved using the EXPLAIN command. You simply but this in front of your query and it will produce a whole bunch of output.

There are various options to the EXPLAIN command. I generally use EXPLAIN (VERBOSE) to get the most important information. When you also include the ANALYZE option PostgreSQL will also execute the query and return actual execution information, and not just the plan. However, this is not really helpful if you query takes half an hour to execute. With ANALYZE you might also want to include the BUFFERS option to get information about shared buffer usage.

An other option which you might need is the FORMAT option. Which format to select depends on the tool you use for visualization/analysis of the query plan. The default output is quite human readable. But when the query is really complex it will become difficult to keep track of the nesting.

EXPLAIN SELECT sum(i) FROM foo WHERE i < 10;

                             QUERY PLAN
---------------------------------------------------------------------
 Aggregate  (cost=23.93..23.93 rows=1 width=4)
   ->  Index Scan using fi on foo  (cost=0.00..23.92 rows=6 width=4)
         Index Cond: (i < 10)
(3 rows)
Enter fullscreen mode Exit fullscreen mode

Output formats include JSON, XML, YAML, and the default TEXT.

Query plan visualization

There are quite some tools out there which can render the query plan into something easier to use.

Note: despite that these visualization are really helpful, this article will contain the plain text output as most details are hidden behind interactive usage.

pgAdmin

The standard administration tool of PostgreSQL pgAdmin has visualization build in. Simple execute the query using the explain action. And you will get a visual representation.

pgAdmin4 query plan visualization

The lines between the various query parts become thicker depending on the cost of that step. (Note: set the analyze options to include costs.) So it will be easy to find where the biggest cost comes from. But note that the thickness of the line is not relative to the total cost of the query. So a part with a cost of 1,000,000 would have the same thickness as the part with a cost of 10,000.

Hovering over the various nodes will show some additional details.

depesz' explain tool

Depesz has an online tool where you can paste the TEXT output of explain and it renders it in a more easy to navigate HTML table.

It will highlight the most expensive parts of the query. Once you start optimizing the query you can keep adding new query plans to compare them.

depesz' explain tool

Postgres EXPLAIN Visualizer (pev)

Pev is another online tool to visualize the query plan. This one uses the JSON output format, and prefers it with the ANALYZE option.

It produces quite a nice looking output with hints on which parts of the query are most problematic.

Postgres EXPLAIN Visualizer (pev)

When you have really complex queries the visualization might become more troublesome to navigate.

Be sure to read the blog post for some more explanation about the tool.

I love it when a plan comes together

Once you have the query plan it is time to inspect it for possible improvement points.

Your first point of attention should go to the lowest level node with the highest cost. This is the place where you can probably gain the most. Each entry in the plan contains a cost range: cost=0.00..23.92. The first value can be considered the optimistic cost and the second one the pessimistic cost, or minimum and maximum cost. These numbers have no sensible unit, just regard them as low number is good, high number is not. When optimizing a query, get the initial cost, and then try to get both values down.

Generally the most costly parts are sequential table scans of large tables (Seq Scan in the query plan). The goal is to try to get rid of these. But do note that PostgreSQL will happily choose to perform sequential table scans for small tables instead of using an index. The cost of these is usually really low, so do not simply try to get rid of all sequential table scans.

As mentioned earlier, every time you make a change you must generate a new query plan. The cost might shift to a different part in the plan, making effectively no difference. Or the new plan might perform even worse.

A way more extensive explanation on the output can be found in the documentation.

Rules of Optimization

Rule #1: Don't

Rule #2: Don't Yet

-- Michael A. Jackson

Before optimizing a query, first try it. Then try it with different parameters. It might not even need optimization yet. Only when the performance is not acceptable, try to improve it until it is acceptable.

Be sure to test the query with different ranges so see how it scales. I generally test with N, 10N, and 100N. Where N is expected nominal range, and 10N would be the worst case. And I always include a 100N which is the range you do not expect, but will happen sooner than you ever hoped. It is just there to get a sense of scaling for the query.

Settings of Interest

PostgreSQL has a lot of settings you can play around with which can make a lot of difference. Especially the resource consumption and query planning settings are of interest.

Out of the box PostgreSQL is configured really defensively, as if it is running on a machine with only 1GiB of RAM.

I will highlight some interesting settings here, which are not too dangerous to play around with. But please do not solely rely on this text. Always read the official documentation, and make sure you understand it before playing around with the settings.

Resource Consumption

Shared Buffers

One of the first settings you want to adjust is shared_buffers. This is the amount of memory the different PostgreSQL connections can use to share data. Data like indexes which were read from the dist. The recommendation is to set it to 25% to 40% of the system RAM. A PostgreSQL expert told me that with the 9.x series have more that 8GiB assigned to shared buffers generally had no significant effect.

Unlike various other database systems PostgreSQL relies a lot on OS functionality like disk caching in memory. So if you have a server with 64GiB of RAM, and shared buffers set to "only" 8GiB a lot of indexes might not be in the shared cache of PostgreSQL, but they are in the disk cache, and thus quickly accessible.

Temp Buffers

The temp_buffers setting controls how much memory can be used for temporary tables before flushing to the disk. This setting can be changed at runtime, but can only be changed at the start of a session.

Work Mem

An other interesting setting is work_mem. It affects how much memory can be used for sorting and processing of hash tables before flushing data to the disk. Increasing it has a lot of positive effect on complex queries with a lot of sorting and joining of sub-results. For BI-like queries it is well worth increasing the size.

The work memory allocation is per operation, which could be multiple in a single session. (Especially in the newer PostgreSQL versions where queries can be executed concurrently). Which means you should be careful not to make it too big, or else you can run into out-of-memory issues.

This is a setting which can also be changed at runtime for the current session.

Query Planning Settings

These settings affect the query planning. It is quite easily to destroy the performance of your server by messing up these settings. So in most cases you need to be really careful.

Effective Cache Size

The effective_cache_size tells PostgreSQL how much data could be cached in the system memory. This includes cached managed by PostgreSQL, but also caches kept by the OS (like the disk cache).

This setting has no effect on memory allocation. It just gives an indication to the query planner on how likely data might be in memory.

On a Linux system you can get an idea on how much data is cached by the system if you have system monitoring which keeps track of amount of memory used for caches during a certain period. /proc/meminfo would tell you how much is currently used, but you really want to know this for a time frame of a week/month.

Sequential / Random Page Cost

The seq_page_cost and random_page_cost settings determine how costly it is to read data from the disk. The default settings assume spinning disks where random seek times are much higher than sequential access.

If you have an all-flash storage the random seek times are much closed to sequential access. In that case it would make sense to bring the random cost closer to the sequential cost. To some degree this also applies to SAN storage on spinning disks.

Indexes

Improving performance of queries is all about indexes. Without indexes the only thing the database can do is sequential table scans. But indexes are not free. They cost storage, and time to maintain. Changing data in the table means that affected indexes also need to be updated. More complex/advanced indexes obviously take more time.

The worst index is an index which is not used. So only create indexes when you need to. And remove indexes which are not used.

Reminder: as stated earlier, for small tables the query planner might choose to skip using an index.

Multi Column and Sorted Indexes

An index which covers multiple columns is generally better than creating multiple indexes for a single column. PostgreSQL will consider using a multi-column index even if some of the columns in the index are not even used in the query.

For example if we had the following index:

create index order_dates on orders(creation_date, completion_date);
Enter fullscreen mode Exit fullscreen mode

It can be used for queries using any combination of the creation_date and completion_date column. The benefit of having fewer indexes is that they might live in the shared memory, and caches, for a longer time.

The default index storage type in PostgreSQL is a B-Tree. For a B-Tree is does not matter if you look up the highest or lowest value, it has the same complexity. But if most of your queries sort in a descending order on an indexed column there will be an extra step in the query execution to sort the results.

This step can be eliminated by adding an order to the index. There are two importing ordering parts:

  1. data ordering
  2. null values

You can control the order for each independently. By default each column is ASC NULLS LAST, meaning ascending with null values at the end.

If it was important to know which orders have not be completed yet it might be best to put the null values first.

create index order_dates on orders(creation_date, completion_date nulls first);
Enter fullscreen mode Exit fullscreen mode

If you would also be interesting in the most recently completed orders it is probably best to also make the completion_date descending.

create index order_dates on orders(creation_date, completion_date desc);
-- is equivalent to
create index order_dates on orders(creation_date, completion_date desc nulls first);
Enter fullscreen mode Exit fullscreen mode

Just to reiterate, ordering of the index only affects performance when the query contains an order by clause on those columns. It does not make the index search faster.

Partial Indexes

Indexes also take up space. A large index is not good for caching. Creating a small index can have a massive performance improvement. Creating a smaller index can be achieved with partial indexed.

Take the previous example of the order_dates. If we often need to query orders which have not been completed yet, and it is common for orders to be eventually completed. Then it is possible to create a small index on the completion date being null.

create index open_orders on orders(completion_date) where completion_date is null;
Enter fullscreen mode Exit fullscreen mode

This index would be really small compared to an full index on the completion_date column. Because it is small, and probably often used, it will be cached quite a lot. When when it is not, only a few expensive disk read operations are needed.

Index on Expressions

Quite often queries contain expressions in a where clause. For example a case insensitive search:

select orders where lower(city) = 'devtown';
Enter fullscreen mode Exit fullscreen mode

Creating an normal index on the city column has no benefit. As this index would contain Devtown, devtown, etc. as separate entries. To make this query faster you can create an index on an expression:

create index lc_city on orders(lower(city));
Enter fullscreen mode Exit fullscreen mode

Now the above query could use this index to quickly find the appropriate rows.

Index-Only Scans

An Index-Only Scan is where the index contains all the data which was required. Normally the index produces a pointer to the actual data row, which then needs to be fetched.

But, if you create an index which contains all the data which is selected in the query then the database does not have to fetch the row. You could consider this as a really fast view on a single table.

You do not have to make the extra fields parts of the indexed data. You can include them in the index as additional data.

create unique index product_ean on product(ean) include(name)
Enter fullscreen mode Exit fullscreen mode

This creates a unique index on the product's ID and it includes the name column in the index. The following query would be able to perform an index-only scan:

select product.ean, product.name, order_line.quantity
from order_line
join product on product.ean = order_line.product_ean
Enter fullscreen mode Exit fullscreen mode

An obvious benefit of using INCLUDE(...) is that you can add additional columns to a unique index without making it part of the uniqueness constraint. But be aware that it will consume storage.

Tuning Tricks

When you start tuning queries it is strongly advised to vacuum and analyze or at least analyze the tables you are going to plan around with.

Both actions can be done without locking tables. But obviously they will affect performance of running operations.

As the query optimizer is affected by the gathered statistics it is important that they are relatively fresh. PostgreSQL will auto-vacuum and auto-analyze tables at certain moments in time. It could very well happen that this happens during your tuning process. Then it might look like you improved the query, but this was due to a different query plan with the newer statistics.

You can check how recently tables where analyzed or vacuumed by checking the pg_stat_all_tables view.

But now some tricks I applied recently to improve some queries.

Too Big to Join

This first trick will seem weird. You would expect the query planner to do this by default. But apparently this was not the case on the server where I applied it. Maybe it has changed in newer versions, but for PostgreSQL this trick turned a query which took hours to execute, into a query which takes only a few seconds.

The goal of this query was to retrieve a bunch of details on orders which were changed during a a specific period. The database structure is roughly as follows.

These tables contain a lot of rows of data:

Table Rows Size
orders 31,889,100 15 GiB
order_line 73,699,400 70 GiB
order_line_detail 2,320,400,000 265 GiB

Attempt #1: Simple Select

The most straightforward query to get all the details of the orders which were changed in the last 6 hours would be like this:

select d.* 
from orders o
join order_line l on l.order_id = o.id
join order_line_detail d on d.line_id = l.id
where o.last_update > now() - interval '6 hours';
Enter fullscreen mode Exit fullscreen mode

This query takes hours to execute. The query plan shows why. Note: because it takes a really long time, the query plans are executed without actual timings.

Hash Join  (cost=10717890.78..77356068.23 rows=823481 width=28)
  Hash Cond: (d.line_id = l.id)
  ->  Seq Scan on order_line_detail d  (cost=0.00..57928447.92 rows=2320398592 width=28)
  ->  Hash  (cost=10717563.84..10717563.84 rows=26155 width=4)
        ->  Hash Join  (cost=42202.77..10717563.84 rows=26155 width=4)
              Hash Cond: (l.order_systemid = o.id)
              ->  Seq Scan on order_line l  (cost=0.00..9938105.76 rows=73699376 width=8)
              ->  Hash  (cost=42061.31..42061.31 rows=11317 width=4)
                    ->  Index Scan using order_last_update_idx on orders o  (cost=0.57..42061.31 rows=11317 width=4)
                          Index Cond: (last_update > (now() - '06:00:00'::interval))
Enter fullscreen mode Exit fullscreen mode

It selects the orders based on an index. But after that the two biggest tables receive a sequential scan. You might think indexes on the foreign keys are missing. But this is not the case.

Attempt #2: Sub-selects

So lets try something else, using sub-selects.

select d.*
from order_line_detail d
where d.line_id in (
    select l.id
    from order_line l
    where l.order_id in (
        select o.id
        from orders o
        where o.last_update > now() - interval '6 hours'
    )
);
Enter fullscreen mode Exit fullscreen mode

The plan changed, looks more complex, but there are still the two really expensive sequential table scans.

Hash Join  (cost=11221517.77..94488998.34 rows=2320398592 width=28)
  Hash Cond: (d.line_id = l.id)
  ->  Seq Scan on order_line_detail d  (cost=0.00..57928447.92 rows=2320398592 width=28)
  ->  Hash  (cost=11136752.37..11136752.37 rows=6781232 width=4)
        ->  Unique  (cost=11102846.21..11136752.37 rows=6781232 width=4)
              ->  Sort  (cost=11102846.21..11119799.29 rows=6781232 width=4)
                    Sort Key: l.id
                    ->  Hash Semi Join  (cost=42195.43..10333409.79 rows=6781232 width=4)
                          Hash Cond: (l.order_systemid = o.id)
                          ->  Seq Scan on order_line l  (cost=0.00..9938105.76 rows=73699376 width=8)
                          ->  Hash  (cost=42053.99..42053.99 rows=11315 width=4)
                                ->  Index Scan using order_last_update_idx on orders o  (cost=0.57..42053.99 rows=11315 width=4)
                                      Index Cond: (last_update > (now() - '06:00:00'::interval))
Enter fullscreen mode Exit fullscreen mode

Still not really usable.

Attempt #3: Common Table Expressions

PostgreSQL has support for so called Common Table Expressions (CTE). While a simple query like this has no benefit for the typical use-case for CTEs, it would not hurt to try.

The new query became:

with cte_order as (
    select o.id 
    from orders o
    where o.last_update > now() - interval '6 hours'
),
cte_line as (
    select l.id
    from order_line l
    where l.order_id in (
        select * from cte_order
    )
)
select d.*
from order_line_details d
where d.id in (select * from cte_line)
Enter fullscreen mode Exit fullscreen mode

Looks quite complicated for what it is supposed to do. Basically each sub-select was moved to a CTE.

So lets take a loot at the query. (Note: execute with actual times, because it's so fast)

Nested Loop  (cost=1225650.89..19184158.78 rows=1160199296 width=28) (actual time=471.251..1171.805 rows=1500960 loops=1)
  CTE cte_order
    ->  Index Scan using co_last_update_idx on orders o  (cost=0.57..42009.62 rows=11303 width=4) (actual time=0.262..51.518 rows=8777 loops=1)
          Index Cond: (last_update > (now() - '06:00:00'::interval))
  CTE cte_line
    ->  Nested Loop  (cost=254.88..354522.71 rows=36849688 width=4) (actual time=56.620..443.167 rows=35141 loops=1)
          ->  HashAggregate  (cost=254.32..256.32 rows=200 width=4) (actual time=56.567..61.080 rows=8777 loops=1)
                ->  CTE Scan on cte_order  (cost=0.00..226.06 rows=11303 width=4) (actual time=0.265..53.986 rows=8777 loops=1)
          ->  Index Scan using order_line_order_id_idx on order_line l  (cost=0.57..1765.34 rows=599 width=8) (actual time=0.020..0.042 rows=4 loops=8777)
                Index Cond: (order_id = cte_order.id)
  ->  HashAggregate  (cost=829117.98..829119.98 rows=200 width=4) (actual time=471.222..489.199 rows=35141 loops=1)
        ->  CTE Scan on cte_line  (cost=0.00..736993.76 rows=36849688 width=4) (actual time=56.623..454.029 rows=35141 loops=1)
  ->  Index Scan using order_line_detail_entity_id_idx on order_line_detail d  (cost=0.58..89389.59 rows=40294 width=28) (actual time=0.006..0.014 rows=43 loops=35141)
        Index Cond: (line_id = cte_line.id)
Enter fullscreen mode Exit fullscreen mode

There you have it. No sequential table scans, only index usage. The query took a little over 1 second to find 1,500,960 rows. The cost estimate is seriously off on this one. This is why you should not rely on just the cost estimations, but also try EXPLAIN with the ANALYZE option.

Despite that all three queries would produce the exact same output, feeding a different construction to the query planner can give quite a different result.

Pivot Point

This next trick cost me a lot of time to execute. First of all I had to properly format this massive 350 line long reporting query into something that was readable.

Part of the query sort of looked like this.

with order_price_transport as (
    select orders.id, price_parts.code, price_parts.status, sum(price_parts.amount)
    from orders
    join price_parts on price_parts.order_id = orders.id
    where price_parts.code in ('FUEL', FREIGHT)
    group by orders.id, price_parts.code, price_parts.status
),
-- ...
order_price_other as (
    select orders.id, price_parts.code, price_parts.status, sum(price_parts.amount)
    from orders
    join price_parts on price_parts.order_id = orders.id
    where price_parts.code not in ('FUEL', 'FREIGHT', ...)
    group by orders.id, price_parts.code, price_parts.status
),
-- ... a whole bunch bunch of other CTEs
select
    orders.id,
    -- ...
    est_trans_price.amount,
    act_trans_price.amount,
    est_other_price.amount,
    act_other_price.amount,
    -- ...
from orders
join order_price_transport as est_trans_price on est_trans_price.id = orders.id and est_trans_price.status = 'estimated'
join order_price_transport as act_trans_price on act_trans_price.id = orders.id and act_trans_price.status = 'actual'
-- ...
join order_price_other as est_other_price on est_other_price.id = orders.id and est_other_price.status = 'estimated'
join order_price_other as act_other_price on act_trans_price.id = orders.id and act_other_price.status = 'actual'
-- ...
where orders.creation_date between $P{START_DATE} and $P{END_DATE}
and orders.organization_id in ($P!{ORGANIZATION_LIST})
and orders.status_id in (123, 456)
Enter fullscreen mode Exit fullscreen mode

The places where I put -- ... contained even more stuff. Simple to say, this report did not run well. It contained a lot of similar common table expressions (CTEs). Some of them were even joined multiple times.

The results of CTEs, and views for that matter, are not cached. They are executed every time they are used. So just from the query which is shown above the price_parts table is queried 4 times already.

You will note that the main query contains some where clauses on the orders table. These were not included in the CTEs which basically used the same order rows. Copying those where clauses to each CTE already makes quite the improvement. But the price_parts and orders tables are still consulted much more than needed.

The price_parts table can contain multiple rows of price parts of an order. Each row has a code, amount, and status (if it was an estimation, or the actual price). Part of this report was to produce an overview of certain groups of prices for a selection of orders. Basically, it was meant to pivot the rows of the price_parts table to columns in the resulting report.

There is no need to pivot this data at the last possible moment. You can do that in a CTE too. So that's what I did. I combined all the CTEs reasoning on the same tables where the different was mostly the content of the where clauses. This transformed the above query to this (maybe even more complex looking query)

with order_prices as (
    select
        id,
        sum(case when status = 'estimated' and code = 'trans' then amount else 0 end) as est_trans_price,
        sum(case when status = 'actual' and code = 'trans' then amount else 0 end) as act_trans_price,
        -- ...
        sum(case when status = 'estimated' and code = 'other' then amount else 0 end) as est_other_price,
        sum(case when status = 'actual' and code = 'other' then amount else 0 end) as act_other_price,
    from (
        select
            orders.id,
            case 
                when price_part.code in ('FUEL', FREIGHT) then 'trans'
                -- ...
                else 'other'
            end as code,
            price_part.status,
            sum (price_part.amount) as amount
        from orders
        join price_part on price_part.order_id = orders.id
        where orders.creation_date between $P{START_DATE} and $P{END_DATE}
            and orders.organization_id in ($P!{ORGANIZATION_LIST})
            and orders.status_id in (123, 456)
        group by orders.id, price_parts.code, price_parts.status
    ) as data
    group by id
)
-- ...
select 
    orders.id
    -- ...
    op.est_trans_price,
    op.act_trans_price,
    op.est_other_price,
    op.act_other_price
from orders
join order_prices op on op.id = orders.id
-- ...
where orders.creation_date between $P{START_DATE} and $P{END_DATE}
and orders.organization_id in ($P!{ORGANIZATION_LIST})
and orders.status_id in (123, 456)
Enter fullscreen mode Exit fullscreen mode

In the most inner select of the order_prices CTE all the relevant data is gathered from the price_part table, duplicated the filters on the applicable orders rows. The rows are are grouped on the order ID, price part code, and status, and a summation is done of the amounts. Instead of returning the actual code of the price_part row, it is translated into a new group which will be used during the pivot.

This select query is then used as a table for the outer select of the order_prices CTE. This is where the pivot takes place. Everything of the data is selected, and grouped on just the id. For each code and status a column is made using a conditional sum aggregation.

While the resulting query was not much smaller, as the new CTEs take almost the same amount of lines (only 50 lines less). It did however reduce 21 joins of CTEs to only 5, as there were now only 5 CTEs. Not only are the number of joins less, the side of the data to carry over in each part of the query is also less, and thus less memory is needed to execute the query.

With these changes the report execution time dropped from 30 minutes, to just 1 minute.

Limitless

So this is a case where the query ran well, until the point where we changed the server configuration to make better use of the available system memory.

The query is quite simple:

select *
from orders 
where organization_id in (7000)
and status_id in (2, 28)
and order_type_id in (146630533)
and system_entry_date < '2019-03-24 23:30:03.524'
order by system_entry_date asc
limit 25
Enter fullscreen mode Exit fullscreen mode

This is the query plan (no actual times, because it takes too long).

Limit  (cost=0.56..23639.89 rows=25 width=2972)
  ->  Index Scan using idx_order_entry_date on orders  (cost=0.56..10885436.12 rows=11512 width=2972)
        Index Cond: (system_entry_date < '2019-03-24 23:30:03.524'::timestamp without time zone)
        Filter: ((status_id = ANY ('{2,28}'::integer[])) AND (organization_id = 7000) AND (order_type_id = 146630533))
Enter fullscreen mode Exit fullscreen mode

On first sight it looks ok. Except that it is using the index order_entry_date_idx which is not the best index to use. There is a more specific index which also includes some other columns. With this query plan it is basically executing a sequential table scan. The index still selects the majority of the rows, it's the subsequent filtering which reduces the number of rows.

If I drop the limit 25 the query plan changes drastically:

Sort  (cost=36349.54..36378.32 rows=11512 width=2972) (actual time=0.163..0.163 rows=0 loops=1)
  Sort Key: system_entry_date
  Sort Method: quicksort  Memory: 25kB
  ->  Index Scan using idx_order_statustype on orders  (cost=0.56..35573.01 rows=11512 width=2972) (actual time=0.136..0.136 rows=0 loops=1)
        Index Cond: ((status_id = ANY ('{2,28}'::integer[])) AND (order_type_id = 146630533) AND (system_entry_date < '2019-03-24 23:30:03.524'::timestamp without time zone))
        Filter: (organization_id = 7000)
Enter fullscreen mode Exit fullscreen mode

This time the correct index is being used. Even though the estimated costs are higher, the actual time is much better.

If I drop the order by in the query instead, it also uses the correct index. Note that due to the nature of the b-tree index the results would actually be returned in ascending order to being with.

Limit  (cost=0.56..77.82 rows=25 width=2972) (actual time=0.142..0.142 rows=0 loops=1)
  ->  Index Scan using idx_order_statustype on orders  (cost=0.56..35573.01 rows=11512 width=2972) (actual time=0.135..0.135 rows=0 loops=1)
        Index Cond: ((status_id = ANY ('{2,28}'::integer[])) AND (order_type_id = 146630533) AND (system_entry_date < '2019-03-24 23:30:03.524'::timestamp without time zone))
        Filter: (organization_id = 7000)
Enter fullscreen mode Exit fullscreen mode

If I change the in (2, 28) into in (2) again the correct index is used. It does not matter if you do foo in (2, 28) or foo = 2 or foo = 28.

Limit  (cost=0.56..63.64 rows=19 width=2972) (actual time=0.105..0.105 rows=0 loops=1)
  ->  Index Scan using idx_order_statustype on orders  (cost=0.56..63.64 rows=19 width=2972) (actual time=0.103..0.103 rows=0 loops=1)
        Index Cond: ((status_id = 2) AND (order_type_id = 146630533) AND (system_entry_date < '2019-03-24 23:30:03.524'::timestamp without time zone))
        Filter: (organization_id = 7000)
Enter fullscreen mode Exit fullscreen mode

Even increasing the limit to 40 produces a better result

Limit  (cost=35936.90..35937.00 rows=40 width=2972) (actual time=3.078..3.078 rows=0 loops=1)
  ->  Sort  (cost=35936.90..35965.68 rows=11512 width=2972) (actual time=3.076..3.076 rows=0 loops=1)
        Sort Key: system_entry_date
        Sort Method: quicksort  Memory: 25kB
        ->  Index Scan using idx_order_statustype on orders  (cost=0.56..35573.01 rows=11512 width=2972) (actual time=3.066..3.066 rows=0 loops=1)
              Index Cond: ((status_id = ANY ('{2,28}'::integer[])) AND (order_type_id = 146630533) AND (system_entry_date < '2019-03-24 23:30:03.524'::timestamp without time zone))
              Filter: (organization_id = 7000)
Enter fullscreen mode Exit fullscreen mode

The query planner is clearly way too optimistic in using the idx_order_entry_date index. If you look at the first query plan you see that the pessimistic cost of using the wrong index is actually quite high. But the limit reduces the cost a lot. It goes from cost=0.56..10885436.12 to cost=0.56..23639.89.

After some searching this appears to be a rather complex bug in PostgreSQL.

In this particular case there were two suitable options available to improve the query:

  1. a higher limit; it only needed to be slightly higher, which did not impact the rest of the process.
  2. no ordering; this was not vital anyway, and the returned order was good enough already.

In some database system you would solve this problem by using index hinting. But this is not something PostgreSQL developers consider introducing.

The trick here is that sometimes relaxing some constraints could result in a better performance. It might seem like a workaround of a bug, which is also is, but sometimes you need these tricks. This problem was observed in 9.5, it could very well not be there anymore in a newer version.

Indexes go wild

B-Tree indexes (the default type) can also be used in wildcard searches when the wildcard is on the end: col LIKE 'foo%'.

There is a small gotcha though. If the used collation is not 'C' then it does not work out of the box.

create index city_name_idx on postal_location(city_name);

explain (analyze) select * from postal_location 
where city_name like 'LONDON%';
Enter fullscreen mode Exit fullscreen mode
Seq Scan on postal_location  (cost=0.00..69965.75 rows=4392 width=75) (actual time=114.905..324.757 rows=4346 loops=1)
  Filter: ((city_name)::text ~~ 'LONDON%'::text)
  Rows Removed by Filter: 1974114
Enter fullscreen mode Exit fullscreen mode

You need to explicitly define the index's operator class to use a pattern variant.

create index city_name_idx on postal_location(city_name varchar_pattern_ops);

explain (analyze) select * from postal_location 
where city_name like 'LONDON%';
Enter fullscreen mode Exit fullscreen mode
Bitmap Heap Scan on postal_location  (cost=111.74..12683.84 rows=4392 width=75) (actual time=0.288..0.812 rows=4346 loops=1)
  Filter: ((city_name)::text ~~ 'LONDON%'::text)
  ->  Bitmap Index Scan on city_name_idx  (cost=0.00..110.64 rows=4221 width=0) (actual time=0.277..0.277 rows=4346 loops=1)
        Index Cond: (((city_name)::text ~>=~ 'LONDON'::text) AND ((city_name)::text ~<~ 'LONDOO'::text))
Enter fullscreen mode Exit fullscreen mode

Note that you should also create an index with the default operator class if you want queries involving ordinary <, <=, >, or >= comparisons to use an index. Such queries cannot use the xxx_pattern_ops operator classes.

Leading Wildcards

If the wildcard is at the beginning, this will not work and you are back to a sequential scan of the table.

explain (analyze) select * from postal_location 
where city_name like '%LONDON';
Enter fullscreen mode Exit fullscreen mode
Seq Scan on postal_location  (cost=0.00..69965.75 rows=4392 width=75) (actual time=118.001..361.855 rows=4316 loops=1)
  Filter: ((city_name)::text ~~ '%LONDON'::text)
  Rows Removed by Filter: 1974144
Enter fullscreen mode Exit fullscreen mode

However you can create an index with an expression to index the reversed value.

create index city_name_rev_idx on postal_location(reverse(city_name) varchar_pattern_ops);

explain (analyze) select * from postal_location 
where reverse(city_name) like reverse('%LONDON');
Enter fullscreen mode Exit fullscreen mode
Bitmap Heap Scan on postal_location  (cost=257.83..24199.06 rows=9892 width=75) (actual time=0.365..1.133 rows=4316 loops=1)
  Filter: (reverse((city_name)::text) ~~ 'NODNOL%'::text)
  ->  Bitmap Index Scan on city_name_rev_idx  (cost=0.00..255.35 rows=9892 width=0) (actual time=0.351..0.351 rows=4316 loops=1)
        Index Cond: ((reverse((city_name)::text) ~>=~ 'NODNOL'::text) AND (reverse((city_name)::text) ~<~ 'NODNOM'::text))
Enter fullscreen mode Exit fullscreen mode

The cost estimation is much higher than of the normal index, but still much lower than the sequential table scan. And more importantly, the actual time is almost comparable to normal index, and significantly better than the sequential table scan.

The reverse wildcard index is especially useful for places where people generally enter the last few characters instead of the first few. For example in case of order numbers.

Readable Code

The last trick should be a no-brainer. But as I mentioned earlier, I had to reformat a query before I could even start optimizing it.

I handled an other huge reporting query, which was really badly formatted. While formatting the query so that I could try to figure out what it was actually doing I saw something which might be a typo. In one of the sub-selects a wrong table alias was used. When I corrected the alias to the one I though was correct the query was really fast, and apparently the output was correct too.

This typo would probably have been easily spotted by the original author if they formatted their SQL properly.

To this day, I do not know what this query was actually supposed to do. But the output was correct, the query was fast, and I was done dealing with this issue within 30 minutes.

šŸ’– šŸ’Ŗ šŸ™… šŸš©
elmuerte
Michiel Hendriks

Posted on April 30, 2019

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related