Loose Index Scan aka Skip Scan in PostgreSQL 🐘🚀
Franck Pachot
Posted on July 26, 2022
In a previous post, I was looking at the solutions proposed by @ryanbooz in his Select the Most Recent Record (of Many Items) With PostgreSQL , and how they map to YugabyteDB. Ryan works with Timescale which implements Index Skip Scan transparently for a DISTINCT ON (truck_id). He mentions the PostgreSQL alternative with Recursive CTE documented as Loose indexscan but also that the biggest drawback with this approach is that it's much more difficult to return multiple columns of data. This shows a DISTINCT returning only one column.
Here is my version, working with additional columns, for the fastest answer to the Most recent record (of many items) when you have the right index.
I'm creating the same tables as in the previous post (and Ryan Booz article):
drop table truck_readings;
create table truck_readings (
truck_id bigint
,ts timestamptz
,milage int
,primary key(truck_id, ts)
);
I am using YugabyteDB here, open-source distributed SQL database which is compatible with PostgreSQL. The syntax is the same but the rows are distributed, by default, by hash on the first column.
I insert some rows: ten thousand readings for each of the two thousand trucks.
with trucks as (select generate_series(1,2000) truck_id)
insert into truck_readings(truck_id, ts, milage)
select truck_id, now() -
interval '1 second' * generate_series(1,10000)
, 42 from trucks;
analyze truck_readings;
At this point, to get the last reading for each truck, we have the same solutions as in the previous post. A list of trucks helps, and maintaining the last value by a trigger is one efficient solution.
For this post, I use a technique similar to an Index Skip Scan, but driven from SQL by a WITH RECURSIVE clause. In PostgreSQL, which is not distributed, or YugabyteDB when you sharded by range, you don't need another index. But if you are hash sharding, or simply when you have a generated primary key, you need a secondary index. Here it is:
create index truck_last_reading on truck_readings
( truck_id asc, ts asc) include (milage);
This index is ordered by truck_id
first, and then by ts
, the timestamp. It includes all columns I need. Here I want the mileage for the last reading of each truck.
Note: if you wonder if it is a good idea to have milage
as include
... it depends. Very good remarks from Michael and Nikolay at
https://youtu.be/8wQ9NHQTQZQ?t=1059
My goal is to start at the end of the index so that I have the last reading for the last truck:
yugabyte=#
select
last_truck_last_reading.truck_id,
last_truck_last_reading.milage
from truck_readings last_truck_last_reading
order by
last_truck_last_reading.truck_id desc,
last_truck_last_reading.ts desc
limit 1;
truck_id | milage
----------+--------
2000 | 42
(1 row)
Then skip to the next one:
yugabyte=#
select truck_id,milage from truck_readings
where truck_id < 2000
order by truck_id desc, ts desc limit 1;
truck_id | milage
----------+--------
1999 | 42
I can combine them with a join to get the truck_id
from the previous one:
with truck_last_reading as (
select
last_truck_last_reading.truck_id,
last_truck_last_reading.milage
from truck_readings last_truck_last_reading
order by
last_truck_last_reading.truck_id desc,
last_truck_last_reading.ts desc
limit 1
)
select
next_truck_last_reading.truck_id,
next_truck_last_reading.milage
from truck_last_reading, lateral
(
select truck_id,milage from truck_readings
where truck_id < truck_last_reading.truck_id
order by truck_id desc, ts desc limit 1
)
as next_truck_last_reading;
WITH RECURSIVE ... LATERAL
Finally, I can show both rows with a UNION ALL. And do this recursively to find the next truck_id
and concatenate it to the result:
with recursive truck_last_reading as (
(
select
last_truck_last_reading.truck_id,
last_truck_last_reading.milage
from truck_readings last_truck_last_reading
order by
last_truck_last_reading.truck_id desc,
last_truck_last_reading.ts desc
limit 1
)
union all
select
next_truck_last_reading.truck_id,
next_truck_last_reading.milage
from truck_last_reading, lateral
(
select truck_id,milage from truck_readings
where truck_id < truck_last_reading.truck_id
order by truck_id desc, ts desc limit 1
)
as next_truck_last_reading
) select * from truck_last_reading
order by truck_id,milage;
This is the final query which gives the required result (the last reading, in ts
order, for each truck_id
and return the milage
). The execution plan is optimal:
,
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (actual time=1078.569..1078.661 rows=2000 loops=1)
Output: truck_last_reading.truck_id, truck_last_reading.milage
Sort Key: truck_last_reading.truck_id, truck_last_reading.milage
Sort Method: quicksort Memory: 142kB
CTE truck_last_reading
-> Recursive Union (actual time=0.872..1075.442 rows=2000 loops=1)
-> Subquery Scan on "*SELECT* 1" (actual time=0.871..0.872 rows=1 loops=1)
Output: "*SELECT* 1".truck_id, "*SELECT* 1".milage
-> Limit (actual time=0.871..0.871 rows=1 loops=1)
Output: last_truck_last_reading.truck_id, last_truck_last_reading.milage, last_truck_last_reading.ts
-> Index Only Scan Backward using truck_last_reading on public.truck_readings last_truck_last_reading (actual time=0.870..0.870 rows=1 loops=1)
Output: last_truck_last_reading.truck_id, last_truck_last_reading.milage, last_truck_last_reading.ts
Heap Fetches: 0
-> Nested Loop (actual time=0.525..0.526 rows=1 loops=2000)
Output: truck_readings.truck_id, truck_readings.milage
-> WorkTable Scan on truck_last_reading truck_last_reading_1 (actual time=0.000..0.000 rows=1 loops=2000)
Output: truck_last_reading_1.truck_id, truck_last_reading_1.milage
-> Limit (actual time=0.524..0.525 rows=1 loops=2000)
Output: truck_readings.truck_id, truck_readings.milage, truck_readings.ts
-> Index Only Scan Backward using truck_last_reading on public.truck_readings (actual time=0.513..0.513 rows=1 loops=2000)
Output: truck_readings.truck_id, truck_readings.milage, truck_readings.ts
Index Cond: (truck_readings.truck_id < truck_last_reading_1.truck_id)
Heap Fetches: 0
-> CTE Scan on truck_last_reading (actual time=0.874..1077.252 rows=2000 loops=1)
Output: truck_last_reading.truck_id, truck_last_reading.milage
Planning Time: 0.165 ms
Execution Time: 1078.821 ms
Peak Memory Usage: 337 kB
(28 rows)
There's no workarea memory needed because only one row is read for each iteration. This uses the index for its best usage: one Index Only Scan
for each truck_id
, and without the need to get a list of truck_id
first. Each read from the index seeks directly to the right place and reads only one row. YugabyteDB pushes down the LIMIT 1 to the storage tablet server.
In theory this access should be possible with hash sharding on the truck_id
, and sorting on the hash rather than the value. The goal is just to get to the next truck_id
, then an order by yb_hash_code(truck_id) desc, truck_id desc, ts desc
could use the same bit there's no implementation for it. Please open a git issue if you need it. But you can also declare the table with primary key (truck_id desc, ts desc)
and pre-split it or let auto-splitting do it when table grows. Note that I've run this example on version 2.15.1 because 2.15.0 had an issue with this.
Posted on July 26, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.