SELECT DISTINCT pushdown to do a loose index scan (skip scan)
Franck Pachot
Posted on February 8, 2023
In the previous post I used WITH RECURSIVE to emulate a DISTINCT ON, by scanning an index with a skip to the next distinct value on the first column. This is known as skip scan or loose index scan. YugabyteDB can do it when scanning an index with a condition on the second column, thanks to the hybrid scan method.
In the last release (version 2.17.1.0 build 439) a new pushdown has been introduced for DISTINCT. This is described in the commit.
Without DISTINCT pushdown
With the table and index created in the previous post, listing the 2000 truck_id from 20 million records takes more than 30 seconds even with an Index Scan:
yugabyte=# explain (costs off, analyze, dist) select distinct truck_id from truck_readings;
QUERY PLAN
---------------------------------------------------------------------------------------
HashAggregate (actual time=42093.894..42094.277 rows=2000 loops=1)
Group Key: truck_id
-> Seq Scan on truck_readings (actual time=2.490..38914.325 rows=20000000 loops=1)
Storage Table Read Requests: 19533
Storage Table Execution Time: 35100.091 ms
Planning Time: 0.065 ms
Execution Time: 42094.523 ms
Storage Read Requests: 19533
Storage Write Requests: 0
Storage Execution Time: 35100.091 ms
Peak Memory Usage: 9081 kB
yugabyte=# /*+ IndexOnlyScan(truck_readings) */ explain (costs off, analyze, dist) select distinct truck_id from truck_readings;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Unique (actual time=3.034..47393.512 rows=2000 loops=1)
-> Index Only Scan using truck_last_reading on truck_readings (actual time=3.032..45894.507 rows=20000000 loops=1)
Heap Fetches: 0
Storage Index Read Requests: 19532
Storage Index Execution Time: 38068.095 ms
Planning Time: 0.098 ms
Execution Time: 47394.577 ms
Storage Read Requests: 19532
Storage Write Requests: 0
Storage Execution Time: 38068.095 ms
Peak Memory Usage: 8 kB
(11 rows)
In both cases, Seq Scan or Index Only Scan, 20 million rows has been fetched from the storage, and aggregated in the SQL layer.
Note that the queries above do not use the new feature, even in version 2.17.1.0 build 439 because:
- without hint, the SELECT without WHERE clause doesn't use the index
- with IndexOnlyScan hint, the new feature is not used
With DISTINCT pushdown
If I force a range scan with a WHERE clause (I use truck_id>=0
here but you can also use truck_id>(select min(truck_id) from truck_readings)
because the aggregate is also pushed down, and fast), the DISTINCT is pushed down:
yugabyte=# explain (costs off, analyze, dist, verbose)
select distinct truck_id from truck_readings
where truck_id>=0;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Unique (actual time=8.614..17.083 rows=2000 loops=1)
Output: truck_id
-> Index Only Scan using truck_last_reading on public.truck_readings (actual time=8.612..16.715 rows=2001 loops=1)
Output: truck_id
Index Cond: (truck_readings.truck_id >= 0)
Heap Fetches: 0
Storage Index Read Requests: 3
Storage Index Execution Time: 16.000 ms
Planning Time: 0.071 ms
Execution Time: 17.197 ms
Storage Read Requests: 3
Storage Write Requests: 0
Storage Execution Time: 16.000 ms
Peak Memory Usage: 8 kB
(14 rows)
The main difference, besides the response time, is the number of rows returned from the Index Scan to the SQL layer: rows=2001
has skipped all rows with duplicate truck_id
DISTINCT ON without LATERAL instead of RECURSIVE
I will apply this to simplify the query in my previous post. I'll still use a CTE (Common Table Expression, aka WITH clause) for better readability, but it is not recursive ans simpler to understand:
with
-- get the first value (for fun, but can use the known minimum)
"first" as (
select min(truck_id) truck_id from truck_readings
),
-- use pushed down DISTINCT to get the list of truck_id
"distinct" as (
select distinct truck_id from truck_readings
where truck_id>=(select truck_id from "first")
)
-- get the last reading for each truck
select truck_readings.* from "distinct", lateral (
select * from truck_readings
where truck_id= "distinct".truck_id
order by ts desc limit 1
) as truck_readings
;
The explain (analyze, costs off, dist, verbose, buffers)
of it is shows how it works:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (actual time=11.454..9108.708 rows=2000 loops=1)
CTE first
-> Result (actual time=0.706..0.706 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (actual time=0.703..0.704 rows=1 loops=1)
-> Index Only Scan using truck_last_reading on truck_readings truck_readings_1 (actual time=0.702..0.702 rows=1 loops=1)
Index Cond: (truck_id IS NOT NULL)
Heap Fetches: 0
Storage Index Read Requests: 1
Storage Index Execution Time: 0.000 ms
CTE distinct
-> Unique (actual time=1.374..22.448 rows=2000 loops=1)
InitPlan 3 (returns $2)
-> CTE Scan on first (actual time=0.707..0.708 rows=1 loops=1)
-> Index Only Scan using truck_last_reading on truck_readings truck_readings_2 (actual time=1.373..20.668 rows=2001 loops=1)
Index Cond: (truck_id >= $2)
Heap Fetches: 0
Storage Index Read Requests: 4
Storage Index Execution Time: 20.000 ms
-> CTE Scan on "distinct" (actual time=1.376..24.344 rows=2000 loops=1)
-> Limit (actual time=4.540..4.541 rows=1 loops=2000)
-> Index Only Scan Backward using truck_last_reading on truck_readings (actual time=4.529..4.529 rows=1 loops=2000)
Index Cond: (truck_id = "distinct".truck_id)
Heap Fetches: 0
Storage Index Read Requests: 2
Storage Index Execution Time: 4.496 ms
Planning Time: 0.221 ms
Execution Time: 9109.490 ms
Storage Read Requests: 3042
Storage Write Requests: 0
Storage Execution Time: 9012.023 ms
Peak Memory Usage: 354 kB
(32 rows)
Time: 9136.663 ms (00:09.137)
The execution is approximately two times faster than WITH RECURSIVE. The main difference is that I used an inequality (truck_id <
) to get to the right row in each loop, but here it is an equality (truck_id =
).
Posted on February 8, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
May 16, 2024
January 30, 2024