YugabyteDB Cost-Based Optimizer and cost model for Distributed LSM-Tree
Franck Pachot
Posted on September 18, 2023
In OLTP database applications, a rule-based optimizer can suffice for many queries, provided that the data model includes appropriate indexes, and the queries remain straightforward. When dealing with more intricate queries involving joins and predicates with varying selectivity, you have the option to fine-tune the plan using Optimizer Hints. To fully leverage the declarative nature of SQL and allow the query planner to discover the optimal plan whintout hints, a Cost-Based Optimizer is employed, which estimates cardinalities and costs.
Here is how to enable the Cost Based Optimizer in YugabyteDB with two parameters:
- Set
yb_enable_optimizer_statistics
to enable the CBO with the PostgreSQL cost model. - Set
yb_enable_base_scans_cost_model
to incorporate the YugabyteDB cost model, which considers distributed operations and scans on the LSM-Tree structures.
In addition to setting those parameters, the optimizer statistcs that describe the data volumes and selectivity must have been gathered by ANALYZE
.
Simple table demo
I create a basic table with a primary key and a single indexed column that exhibits skewed values:
create extension if not exists pgcrypto;
create table demo as
select gen_random_uuid() id, least(10001,generate_series(1,20000)) n;
create index demo_n on demo(n);
analyze demo;
The table holds 10000 rows with distinct values, and an additional 10000 rows where the column "n" has a value of 10001
. I'll test two queries: one for the distinct value n=10000
and one for the popular value where n=10001
In general, an Index Scan is faster for the former scenario since it retrieves only a single row. However, for the latter scenario, which reads half of the rows randomly scattered rows throughout the table, a Full Table Scan (Seq Scan) might be the most efficient approach as it the rows reads sequentially.
Upon examining pg_stats
, updated by the ANALYZE command, it's evident that the query planner possesses sufficient information to consider this distribution when optimizing queries:
yugabyte=# select * from pg_stats
where tablename='demo' and attname='n';
-[ RECORD 1 ]----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname | public
tablename | demo
attname | n
inherited | f
null_frac | 0
avg_width | 4
n_distinct | -0.50005
most_common_vals | {10001}
most_common_freqs | {0.5}
histogram_bounds | {1,100,200,300,400,500,600,700,800,900,1000,1100,1200,1300,1400,1500,1600,1700,1800,1900,2000,2100,2200,2300,2400,2500,2600,2700,2800,2900,3000,3100,3200,3300,3400,3500,3600,3700,3800,3900,4000,4100,4200,4300,4400,4500,4600,4700,4800,4900,5000,5100,5200,5300,5400,5500,5600,5700,5800,5900,6000,6100,6200,6300,6400,6500,6600,6700,6800,6900,7000,7100,7200,7300,7400,7500,7600,7700,7800,7900,8000,8100,8200,8300,8400,8500,8600,8700,8800,8900,9000,9100,9200,9300,9400,9500,9600,9700,9800,9900,10000}
correlation | 0.243699
most_common_elems |
most_common_elem_freqs |
elem_count_histogram |
yugabyte=# \x
Expanded display is off.
yugabyte=#
The average frequency of each value is 0.50005
but there's more information about the popular value 10001
which has a frequency of frequency is 0.5
.
Rule-Based Optimizer
I'm running this with YugabyteDB 2.19.2 where the default is to have the two parameters disabled, then using the Rule Based Optimizer:
yugabyte=# set yb_enable_optimizer_statistics=off;
SET
yugabyte=# set yb_enable_base_scans_cost_model=off;
SET
yugabyte=# explain (verbose, analyze, summary off)
select * from demo where n=10000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Index Scan using demo_n on public.demo (cost=0.00..28.50 rows=200 width=20) (actual time=1.678..1.681 rows=1 loops=1)
Output: id, n
Index Cond: (demo.n = 10000)
(3 rows)
yugabyte=# explain (verbose, analyze, summary off)
select * from demo where n=10001;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Index Scan using demo_n on public.demo (cost=0.00..28.50 rows=200 width=20) (actual time=5.539..36.843 rows=10000 loops=1)
Output: id, n
Index Cond: (demo.n = 10001)
(3 rows)
Without using the optimizer statistics, there's no differentiation between the two predicates, and the plan remains consistent, utilizing the index.
This approach may be acceptable since YugabyteDB optimizes the Index Scan to minimize read requests:
yugabyte=# explain (verbose, analyze, dist, summary off) select * from demo where n=10000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Index Scan using demo_n on public.demo (cost=0.00..28.50 rows=200 width=20) (actual time=1.723..1.725 rows=1 loops=1)
Output: id, n
Index Cond: (demo.n = 10000)
Storage Table Read Requests: 1
Storage Table Read Execution Time: 0.616 ms
Storage Index Read Requests: 1
Storage Index Read Execution Time: 0.940 ms
(7 rows)
yugabyte=# explain (verbose, analyze, dist, summary off) select * from demo where n=10001;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Index Scan using demo_n on public.demo (cost=0.00..28.50 rows=200 width=20) (actual time=5.174..36.038 rows=10000 loops=1)
Output: id, n
Index Cond: (demo.n = 10001)
Storage Table Read Requests: 10
Storage Table Read Execution Time: 28.454 ms
Storage Index Read Requests: 10
Storage Index Read Execution Time: 1.281 ms
(7 rows)
Even when the Index Scan has to access 10000 table rows, those are batched to a minimum of read requests to reduce the impact of remote calls latency.
Cost Based Optimizer
I enabled the Cost Based Optimizer, inherited from PostgreSQL, without enabling the YugabyteDB cost model, and run the same queries:
yugabyte=# set yb_enable_optimizer_statistics=on;
SET
yugabyte=# set yb_enable_base_scans_cost_model=off;
SET
yugabyte=# explain (verbose, analyze, summary off)
select * from demo where n=10000;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Index Scan using demo_n on public.demo (cost=4.00..8.12 rows=1 width=20) (actual time=1.379..1.382 rows=1 loops=1)
Output: id, n
Index Cond: (demo.n = 10000)
(3 rows)
yugabyte=# explain (verbose, analyze, summary off)
select * from demo where n=10001;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Index Scan using demo_n on public.demo (cost=4.00..1233.00 rows=10000 width=20) (actual time=5.256..36.475 rows=10000 loops=1)
Output: id, n
Index Cond: (demo.n = 10001)
(3 rows)
In both cases, Index Scans are employed. However, with optimizer statistics in place, the estimated number of rows is precise, and the cost calculation accounts for the retrieval of a greater number of rows.
Why did the query planner opt for an Index Scan despite estimating a high cost for scanning 10000 rows? This is where optimizer hints becomes valuable to compare this cost with that of a Seq Scan:
yugabyte=# explain (verbose, analyze, summary off)
/*+ SeqScan(demo) */
select * from demo where n=10001;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
YB Seq Scan on public.demo (cost=4.00..2054.00 rows=10000 width=20) (actual time=6.696..11.028 rows=10000 loops=1)
Output: id, n
Remote Filter: (demo.n = 10001)
(3 rows)
Thanks to pg_hint_plan
, there's no room for uncertainty: The cost estimation for a Seq Scan is higher than that of an Index Scan on half the rows.
However, when considering actual execution time, the Seq Scan proves to be faster. This highlights the need to refine the PostgreSQL cost model to align it more accurately with the YugabyteDB architecture (distributed LSM-Trees). In addition to enabling the Cost Based Optimizer, I'll enable the new cost model.
YugabyteDB cost model
I'm setting both parameters to on
to use the Cost-Based Optimizer with the YugabyteDB Cost Model, and run the same queries:
yugabyte=# set yb_enable_optimizer_statistics=on;
SET
yugabyte=# set yb_enable_base_scans_cost_model=on;
SET
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Index Scan using demo_n on public.demo (cost=12.65..47.11 rows=1 width=20) (actual time=1.107..1.109 rows=1 loops=1)
Output: id, n
Index Cond: (demo.n = 10000)
Estimated Seeks: 2
Estimated Nexts: 0
(5 rows)
yugabyte=# explain (verbose, analyze, summary off) select * from demo where n=10001;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Seq Scan on public.demo (cost=6.33..12772.78 rows=10000 width=20) (actual time=3.853..9.128 rows=10000 loops=1)
Output: id, n
Remote Filter: (demo.n = 10001)
(3 rows)
The cost values cannot be compared to the previous one because it is a different model. A Seq Scan is employed when processing half of the rows, estimated with cost=12772.78
. Conversely, an Index Scan is chosen when accessing a single row, estimated with cost=47.11
.
It provides additional estimated information, specifically the number of seek() and next() operations. It's worth noting that seek() operations are more resource-intensive because they involve locating a key in the LSM-Tree, while next() operations are less costly since they simply read the next item in the sorted structure. This extra information is revealed when the verbose
option of explain
is used, in addition to displaying the columns that are being read.
To clarify why the query planner opted for a Sequential Scan instead of an Index Scan, check that the cost of an Index Scan, under this cost model, is higher (cost=69801.64
):
yugabyte=# explain (verbose, analyze, summary off)
/*+ IndexScan(demo) */
select * from demo where n=10001;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Index Scan using demo_n on public.demo (cost=12.65..69801.64 rows=10000 width=20) (actual time=10.879..42.286 rows=10000 loops=1)
Output: id, n
Index Cond: (demo.n = 10001)
Estimated Seeks: 10001
Estimated Nexts: 9999
(5 rows)
The additional information provided makes it clear why the cost is high in this case: for each index entry that is read, it necessitates an access to the table row with a seek() operation, 10000 in total. The remaining 1 seek() is the initial seek() into the first index entry, and the subsequent 9999 next() read sequentially the following index entries as long as they satisfy the index condition.
With the comprehensive information provided by the verbose
option, we can further optimize this scenario. The Index Scan is set to retrieve the columns "id" and "n". While "n" can be obtained from the index entry, retrieving "id" necessitates reading the table row. By incorporating the "id" column into the index, it becomes feasible to perform an Index Only Scan, which the query planner promptly recognizes and utilizes.
yugabyte=# create index demo_n_covering on demo(n) include(id);
CREATE INDEX
yugabyte=# explain (verbose, analyze, summary off)
select * from demo where n=10001;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using demo_n_covering on public.demo (cost=6.33..6490.78 rows=10000 width=20) (actual time=2.004..10.125 rows=10000 loops=1)
Output: id, n
Index Cond: (demo.n = 10001)
Heap Fetches: 0
Estimated Seeks: 1
Estimated Nexts: 9999
(6 rows)
In YugabyteDB, you'll consistently observe Heap Fetches: 0
. This metric originates from PostgreSQL, where achieving a true Index Only Scan is feasible only when the table was not updated since the last vacuum. However, it's worth noting that YugabyteDB's MVCC implementation effectively eliminates the need for any VACUUM-related concern.
As anticipated, the estimated seek() and next() operations mirror those of the index range scan in the previous case, but without the additional 10000 seeks() into the table's LSM-Tree.
In this scenario, the new covering index can replace the old one, which can subsequently be dropped. To summarize, the Cost-Based Optimizer with the YugabyteDB cost model empowers the query planner to identify the optimal execution plan based on the provided indexes. The EXPLAIN VERBOSE option, and analysis with optimizer hints, offer additional insights that help enhancing the indexing strategy, enabling more efficient access paths that the query planner will automatically select when available.
Posted on September 18, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
December 23, 2023
November 18, 2023