How Does PostgreSQL Estimate The Cost Of A Query With The EXPLAIN Command

mghrabi

Ahmed Hisham

Posted on May 4, 2023

How Does PostgreSQL Estimate The Cost Of A Query With The EXPLAIN Command

In this blog we will explore how PostgreSQL EXPLAIN command estimates the cost of a query, we will start by setting up a test database, then we will run a sequential scan command and estimate it manually and then compare it with EXPLAIN result.

Setting up a database

First we will create a test database and insert some values into it. Open a postgres instance and create a testdb database:

postgres=# CREATE DATABASE testdb;
postgres=# \c testdb
Enter fullscreen mode Exit fullscreen mode

Create a table with two columns as follows:

testdb=# CREATE TABLE test_tbl (id INTEGER PRIMARY KEY, order_num INTEGER);
Enter fullscreen mode Exit fullscreen mode

Now we want to insert some random rows into the table:

testdb=# INSERT INTO test_tbl SELECT generate_series(1, 10000), generate_series(1,10000);
INSERT 0 10000
Enter fullscreen mode Exit fullscreen mode

To make sure everything is correct, run:

testdb=# SELECT count(*) FROM test_tbl;
count
-------
 10000
(1 row)
Enter fullscreen mode Exit fullscreen mode

Estimating The Cost Of A Sequential Scan Query

A sequential scan query means that query will need to scan the entire table, going over every single tuple (row). We will run this simple sequential scan query to make our tests on it SELECT * FROM test_tbl, we will start by printing out the cost of this query using EXPLAIN command:

testdb=# EXPLAIN SELECT * FROM test_tbl;
                          QUERY PLAN
--------------------------------------------------------------
 Seq Scan on test_tbl  (cost=0.00..145.00 rows=10000 width=8)
(1 row)
Enter fullscreen mode Exit fullscreen mode

You can notice the estimated cost has two numbers 0.00 and 145.00, yes they are two numbers, they represent the start-up and the total costs respectively.

In PostgreSQL, there are three kinds of costs: start-up, run and total. The total cost is the sum of start-up and run costs.

  • The start-up cost is the cost expended before exactly fetching the first tuple in the table.

  • The run cost is the cost to fetch all tuples.

  • The total cost as described is the sum of the costs of both start-up and run costs.

Now let's compute the cost ourselves according to the website here the formula is:

Total cost = (cpu run cost) + (disk run cost)
Total cost = (cpu_tuple_cost + cpu_operator_cost) * N_tuples + seq_page_cost * N_pages

Don't be shocked it is simple, let's understand each variable alone:

  • cpu_tuple_cost: This is the cost of processing each tuple (row) in a query result. The default value is 0.01.
  • cpu_operator_cost: This is the cost of processing each operator or function call in a query. The default value is 0.0025.
  • seq_page_cost: This is the cost of reading a page from disk sequentially. The default value is 1.0.
  • N_tuples: This is the number of tuples (rows) returned by the query.
  • N_pages: This is the number of pages read from disk during the query.

All variables are known except for N_tuples and N_pages which we can by running the following command:

testdb=# SELECT relpages, reltuples FROM pg_class WHERE relname = 'test_tbl';
 relpages | reltuples
----------+-----------
       45 |     10000
(1 row)
Enter fullscreen mode Exit fullscreen mode

As you can notice, the number of tuples is 10000 which we inserted, and the number of pages is 45 (a page is a block of data that is read from or written to disk as a unit, it is a section inside the table file). Now we have everything we need to use the equation:
Total cost = (0.01 + 0.0025) * 10000 + 1.0 * 45
= 102.5 + 45
= 147.5

The result is 147.5!, which is almost the same as what the EXPLAIN command estimated.

I hope that was helpful.

💖 💪 🙅 🚩
mghrabi
Ahmed Hisham

Posted on May 4, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related