ScaleGrid
Posted on April 4, 2022
Understanding the Postgres EXPLAIN cost
EXPLAIN
is very useful for understanding the performance of a Postgres query. It returns the execution plan generated by PostgreSQL query planner for a given statement. The EXPLAIN
command specifies whether the tables referenced in a statement will be searched using an index scan or a sequential scan.
Some of the first things you’ll notice when reviewing the output of an EXPLAIN
command are the cost statistics, so it’s natural to wonder what they mean, how they’re calculated, and how they’re used.
In short, the PostgreSQL query planner is estimating how much time the query will take (in an arbitrary unit), with both a startup cost and a total cost for each operation. More on that later. When it has multiple options for executing a query, it uses these costs to choose the cheapest, and therefore hopefully fastest, option.
What unit are the costs in?
The costs are in an arbitrary unit. A common misunderstanding is that they are in milliseconds or some other unit of time, but that’s not the case.
The cost units are anchored (by default) to a single sequential page read costing 1.0 units (seq_page_cost
). Each row processed adds 0.01 (cpu_tuple_cost
), and each non-sequential page read adds 4.0 (random_page_cost
). There are many more constants like this, all of which are configurable. That last one is a particularly common candidate, at least on modern hardware. We’ll look into that more in a bit.
Startup Costs
The first numbers you see after cost=
are known as the "startup cost". This is an estimate of how long it will take to fetch the first row. As such, the startup cost of an operation includes the cost of its children.
For a sequential scan, the startup cost will generally be close to zero, as it can start fetching rows straight away. For a sort operation, it will be higher because a large proportion of the work needs to be done before rows can start being returned.
To look at an example, let’s create a simple test table with 1000 usernames:
CREATE TABLE users (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
username text NOT NULL);
INSERT INTO users (username)
SELECT 'person' || n
FROM generate_series(1, 1000) AS n;
ANALYZE users;
Let’s take a look at a simple query plan, with a couple of operations:
EXPLAIN SELECT * FROM users ORDER BY username;
QUERY PLAN |
--------------------------------------------------------------+
Sort (cost=66.83..69.33 rows=1000 width=17) |
Sort Key: username |
-> Seq Scan on users (cost=0.00..17.00 rows=1000 width=17)|
In the above query plan, as expected, the estimated statement execution cost for the Seq Scan
is 0.00
, and for the Sort
is 66.83
.
Total costs
The second cost statistic, after the startup cost and the two dots, is known as the "total cost". This is an estimate of how long it will take to return all the rows.
Let’s look at that example query plan again:
QUERY PLAN |
--------------------------------------------------------------+
Sort (cost=66.83..69.33 rows=1000 width=17) |
Sort Key: username |
-> Seq Scan on users (cost=0.00..17.00 rows=1000 width=17)|
We can see that the total cost of the Seq Scan
operation is 17.00
. For the Sort
operation is 69.33, which is not much more than its startup cost (as expected).
Total costs usually include the cost of the operations preceding them. For example, the total cost of the Sort operation above includes that of the Seq Scan.
An important exception is LIMIT
clauses, which the planner uses to estimate whether it can abort early. If it only needs a small number of rows, the conditions for which are common, it may calculate that a simpler scan choice is cheaper (likely to be faster).
For example:
EXPLAIN SELECT * FROM users LIMIT 1;
QUERY PLAN |
--------------------------------------------------------------+
Limit (cost=0.00..0.02 rows=1 width=17) |
-> Seq Scan on users (cost=0.00..17.00 rows=1000 width=17)|
As you can see, the total cost reported on the Seq Scan node is still 17.00, but the full cost of the Limit operation is reported to be 0.02. This is because the planner expects that it will only have to process 1 row out of 1000, so the cost, in this case, is estimated to be 1000th of the total.
How the costs are calculated
In order to calculate these costs, the Postgres query planner uses both constants (some of which we’ve already seen) and metadata about the contents of the database. The metadata is often referred to as "statistics".
Statistics are gathered via ANALYZE
(not to be confused with the EXPLAIN
parameter of the same name), and stored in pg_statistic
. They are also refreshed automatically as part of autovacuum.
These statistics include a number of very useful things, like roughly the number of rows each table has, and what the most common values in each column are.
Let’s look at a simple example, using the same query data as before:
EXPLAIN SELECT count(*) FROM users;
QUERY PLAN |
-------------------------------------------------------------+
Aggregate (cost=19.50..19.51 rows=1 width=8) |
-> Seq Scan on users (cost=0.00..17.00 rows=1000 width=0)|
In our case, the planner's statistics suggested the data for the table was stored within 7 pages (or blocks), and that 1000 rows would be returned. The cost parameters seq_page_cost
, cpu_tuple_cost
, and cpu_operator_cost
were left at their defaults of 1
, 0.01
, and 0.0025
respectively.
As such, the Seq Scan total cost was calculated as:
Total cost of Seq Scan
= (estimated sequential page reads * seq_page_cost) + (estimated rows returned * cpu_tuple_cost)
= (7 * 1) + (1000 * 0.01)
= 7 + 10.00
= 17.00
And for the Aggregate as:
Total cost of Aggregate
= (cost of Seq Scan) + (estimated rows processed * cpu_operator_cost) + (estimated rows returned * cpu_tuple_cost)
= (17.00) + (1000 * 0.0025) + (1 * 0.01)
= 17.00 + 2.50 + 0.01
= 19.51
How the planner uses the costs
Since we know Postgres will pick the query plan with the lowest total cost, we can use that to try to understand the choices it has made. For example, if a query is not using an index that you expect it to, you can use settings like enable_seqscan
to massively discourage certain query plan choices. By this point, you shouldn’t be surprised to hear that settings like this work by increasing the costs!
Row numbers are an extremely important part of cost estimation. They are used to calculate estimates for different join orders, join algorithms, scan types, and more. Row cost estimates that are out by a lot can lead to cost estimation being out by a lot, which can ultimately result in a suboptimal plan choice being made.
Using EXPLAIN ANALYZE to get a query plan
When you write SQL statements in PostgreSQL, the ANALYZE
command is key to optimizing queries, making them faster and more efficient. In addition to displaying the query plan and PostgreSQL estimates, the EXPLAIN ANALYZE
option performs the query (be careful with UPDATE
and DELETE
!), and shows the actual execution time and row count number for each step in the execution process. This is necessary for monitoring SQL performance.
You can use EXPLAIN ANALYZE
to compare the estimated number of rows with the actual rows returned by each operation.
Let's look at an example, using the same data again:
QUERY PLAN |
-----------------------------------------------------------------------------------------------------------+
Sort (cost=66.83..69.33 rows=1000 width=17) (actual time=20.569..20.684 rows=1000 loops=1) |
Sort Key: username |
Sort Method: quicksort Memory: 102kB |
-> Seq Scan on users (cost=0.00..17.00 rows=1000 width=17) (actual time=0.048..0.596 rows=1000 loops=1)|
Planning Time: 0.171 ms |
Execution Time: 20.793 ms |
We can see that the total execution cost is still 69.33, with the majority of that being the Sort operation, and 17.00 coming from the Sequential Scan. Note that the query execution time is just under 21ms.
Sequential scan vs. Index Scan
Now, let’s add an index to try to avoid that costly sort of the entire table:
CREATE INDEX people_username_idx ON users (username);
EXPLAIN ANALYZE SELECT * FROM users ORDER BY username;
QUERY PLAN |
---------------------------------------------------------------------------------------------------------------------------------+
Index Scan using people_username_idx on users (cost=0.28..28.27 rows=1000 width=17) (actual time=0.052..1.494 rows=1000 loops=1)|
Planning Time: 0.186 ms |
Execution Time: 1.686 ms |
As you can see, the query planner has now chosen an Index Scan, since the total cost of that plan is 28.27 (lower than 69.33). It looks that the index scan was more efficient than the sequential scan, as the query execution time is now just under 2ms.
Helping the planner estimate more accurately
We can help the planner estimate more accurately in two ways:
- Help it gather better statistics
- Tune the constants it uses for the calculations
The statistics can be especially bad after a big change to the data in a table. As such, when loading a lot of data into a table, you can help Postgres out by running a manual ANALYZE
on it. Statistics also do not persist over a major version upgrade, so that’s another important time to do this.
Naturally, tables also change over time, so tuning the autovacuum settings to make sure it runs frequently enough for your workload can be very helpful.
If you’re having trouble with bad estimates for a column with a skewed distribution, you may benefit from increasing the amount of information Postgres gathers by using the ALTER TABLE SET STATISTICS
command, or even the default_statistics_target
for the whole database.
Another common cause of bad estimates is that, by default, Postgres will assume that two columns are independent. You can fix this by asking it to gather correlation data on two columns from the same table via extended statistics.
On the constant tuning front, there are a lot of parameters you can tune to suit your hardware. Assuming you’re running on SSDs, you’ll likely at minimum want to tune your setting of random_page_cost
. This defaults to 4, which is 4x more expensive than the seq_page_cost
we looked at earlier. This ratio made sense on spinning disks, but on SSDs it tends to penalize random I/O too much. As such a setting closer to 1, or between 1 and 2, might make more sense. At ScaleGrid, we default to 1.
Can I remove the costs from query plans?
For many of the reasons mentioned above, most people leave the costs on when running EXPLAIN
. However, should you wish, you can turn them off using the COSTS
parameter.
EXPLAIN (COSTS OFF) SELECT * FROM users LIMIT 1;
QUERY PLAN |
-----------------------+
Limit |
-> Seq Scan on users|
Conclusion
To re-cap, the costs in query plans are Postgres’ estimates for how long an SQL query will take, in an arbitrary unit.
It picks the plan with the lowest overall cost, based on some configurable constants and some statistics it has gathered.
Helping it estimate these costs more accurately is very important to help it make good choices, and keep your queries performant.
Posted on April 4, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.