Ahmed Hisham
Posted on May 4, 2023
In this blog we will explore how PostgreSQL EXPLAIN command estimates the cost of a query, we will start by setting up a test database, then we will run a sequential scan command and estimate it manually and then compare it with EXPLAIN result.
Setting up a database
First we will create a test database and insert some values into it. Open a postgres instance and create a testdb
database:
postgres=# CREATE DATABASE testdb;
postgres=# \c testdb
Create a table with two columns as follows:
testdb=# CREATE TABLE test_tbl (id INTEGER PRIMARY KEY, order_num INTEGER);
Now we want to insert some random rows into the table:
testdb=# INSERT INTO test_tbl SELECT generate_series(1, 10000), generate_series(1,10000);
INSERT 0 10000
To make sure everything is correct, run:
testdb=# SELECT count(*) FROM test_tbl;
count
-------
10000
(1 row)
Estimating The Cost Of A Sequential Scan Query
A sequential scan query means that query will need to scan the entire table, going over every single tuple (row). We will run this simple sequential scan query to make our tests on it SELECT * FROM test_tbl
, we will start by printing out the cost of this query using EXPLAIN
command:
testdb=# EXPLAIN SELECT * FROM test_tbl;
QUERY PLAN
--------------------------------------------------------------
Seq Scan on test_tbl (cost=0.00..145.00 rows=10000 width=8)
(1 row)
You can notice the estimated cost has two numbers 0.00 and 145.00, yes they are two numbers, they represent the start-up and the total costs respectively.
In PostgreSQL, there are three kinds of costs: start-up, run and total. The total cost is the sum of start-up and run costs.
The start-up cost is the cost expended before exactly fetching the first tuple in the table.
The run cost is the cost to fetch all tuples.
The total cost as described is the sum of the costs of both start-up and run costs.
Now let's compute the cost ourselves according to the website here the formula is:
Total cost = (cpu run cost) + (disk run cost)
Total cost = (cpu_tuple_cost + cpu_operator_cost) * N_tuples + seq_page_cost * N_pages
Don't be shocked it is simple, let's understand each variable alone:
-
cpu_tuple_cost
: This is the cost of processing each tuple (row) in a query result. The default value is 0.01. -
cpu_operator_cost
: This is the cost of processing each operator or function call in a query. The default value is 0.0025. -
seq_page_cost
: This is the cost of reading a page from disk sequentially. The default value is 1.0. -
N_tuples
: This is the number of tuples (rows) returned by the query. -
N_pages
: This is the number of pages read from disk during the query.
All variables are known except for N_tuples
and N_pages
which we can by running the following command:
testdb=# SELECT relpages, reltuples FROM pg_class WHERE relname = 'test_tbl';
relpages | reltuples
----------+-----------
45 | 10000
(1 row)
As you can notice, the number of tuples is 10000 which we inserted, and the number of pages is 45 (a page is a block of data that is read from or written to disk as a unit, it is a section inside the table file). Now we have everything we need to use the equation:
Total cost = (0.01 + 0.0025) * 10000 + 1.0 * 45
= 102.5 + 45
= 147.5
The result is 147.5!, which is almost the same as what the EXPLAIN
command estimated.
I hope that was helpful.
Posted on May 4, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.