pg_hint_plan for Batched Nested Loop Join in YugabyteDB

franckpachot

Franck Pachot

Posted on January 1, 2024

pg_hint_plan for Batched Nested Loop Join in YugabyteDB

In my previous post I mentioned that the Batched Nested Loop feature has been improved and is now enabled by default. However, if you previously used this feature during its preview phase and utilized optimizer hints, there is an important change that you should be aware of. In short, it was an alternative to Nested Loop in preview, but now it is a full-fledged join method completing the existing ones: Nested Loop, Merge Join, and Hash Join.


TL;DR: If you used the NestLoop hint in the previous version to enforce a Nested loop and set yb_bnl_batch_size higher than 1 to batch it, you need to replace the hint with YBbatchedNL. This is because NestLoop hint will now enforce only the non-batched one.


I've started a cluster with the Northwind demo and two additional indexes to join customers to orders:

yugabyted start
yugabyted demo connect
\d

create index customers_city     on customers(city, customer_id);
create index orders_customer_id on orders(customer_id);
Enter fullscreen mode Exit fullscreen mode

Here is the execution plan in YugabyteDB 2.21 with all defaults and no hints:

yb_demo_northwind=# explain (analyze, costs off)
 select * from customers
 join orders using(customer_id)
where city='London'
order by customer_id;
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Sort (actual time=10.076..10.077 rows=46 loops=1)
   Sort Key: customers.customer_id
   Sort Method: quicksort  Memory: 37kB
   ->  Hash Join (actual time=9.438..10.027 rows=46 loops=1)
         Hash Cond: (orders.customer_id = customers.customer_id)
         ->  Seq Scan on orders (actual time=2.336..2.734 rows=830 loops=1)
         ->  Hash (actual time=7.053..7.053 rows=6 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 9kB
               ->  Index Scan using customers_city on customers (actual time=7.018..7.029 rows=6 loops=1)
                     Index Cond: ((city)::text = 'London'::text)
 Planning Time: 0.248 ms
 Execution Time: 10.169 ms
 Peak Memory Usage: 315 kB
(13 rows)
Enter fullscreen mode Exit fullscreen mode

In the previous version, to enable a batched nested loop you used the NestedLoop hint for the join method and set yb_bnl_batch_size to enabled Batched Nested Loop.

If you re-use the same hint in 2.21 you will see the non-batched nested loop:

yb_demo_northwind=# explain (analyze, costs off)
 /*+ NestLoop(customers orders) Set(yb_bnl_batch_size 1024) */
 select * from customers
 join orders using(customer_id)
where city='London'
order by customer_id;

                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Nested Loop (actual time=2.692..6.457 rows=46 loops=1)
   ->  Index Scan using customers_city on customers (actual time=1.440..1.449 rows=6 loops=1)
         Index Cond: ((city)::text = 'London'::text)
   ->  Index Scan using orders_customer_id on orders (actual time=0.801..0.808 rows=8 loops=6)
         Index Cond: (customer_id = customers.customer_id)
 Planning Time: 0.215 ms
 Execution Time: 6.526 ms
 Peak Memory Usage: 56 kB
(8 rows)
;
Enter fullscreen mode Exit fullscreen mode

This does the same a PostgreSQL Nested Loop with one loop for each outer row (rows=6 / loops=6 )

To force a Batched Nested Loop you don't need to set the batch size, but use the new hint YBbatchedNL:


explain (analyze, costs off)
 /*+ YBbatchedNL(customers orders) */
 select * from customers
 join orders using(customer_id)
where city='London'
order by customer_id;
;

                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 YB Batched Nested Loop Join (actual time=3.944..3.958 rows=46 loops=1)
   Join Filter: (customers.customer_id = orders.customer_id)
   Sort Keys: customers.customer_id
   ->  Index Scan using customers_city on customers (actual time=2.620..2.653 rows=6 loops=1)
         Index Cond: ((city)::text = 'London'::text)
   ->  Index Scan using orders_customer_id on orders (actual time=1.097..1.136 rows=46 loops=1)
         Index Cond: (customer_id = ANY (ARRAY[customers.customer_id, $1, $2, ..., $1023]))
 Planning Time: 0.654 ms
 Execution Time: 4.289 ms
 Peak Memory Usage: 593 kB
(10 rows)
Enter fullscreen mode Exit fullscreen mode

Now, all rows were fetched in one loop (loops=1)

If you didn't use a hint but session parameters to disable other methods, you can do the same. However, now, disabling Nested Loop with enable_nestloop=off still allows the Batched Nested Loop:

yb_demo_northwind=# set enable_hashjoin=off;
SET
yb_demo_northwind=# set enable_nestloop=off;
SET

yb_demo_northwind=# explain (analyze, costs off)
 select * from customers
 join orders using(customer_id)
where city='London'
order by customer_id;
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 YB Batched Nested Loop Join (actual time=2.908..2.926 rows=46 loops=1)
   Join Filter: (customers.customer_id = orders.customer_id)
   Sort Keys: customers.customer_id
   ->  Index Scan using customers_city on customers (actual time=1.355..1.365 rows=6 loops=1)
         Index Cond: ((city)::text = 'London'::text)
   ->  Index Scan using orders_customer_id on orders (actual time=1.339..1.381 rows=46 loops=1)
         Index Cond: (customer_id = ANY (ARRAY[customers.customer_id, $1, $2, ..., $1023]))
 Planning Time: 0.648 ms
 Execution Time: 3.213 ms
 Peak Memory Usage: 657 kB
(10 rows)
Enter fullscreen mode Exit fullscreen mode

If you want to disable all Nested Loops, you need to also set yb_enable_batchednl=off:

yb_demo_northwind=# set enable_hashjoin=off;
SET
yb_demo_northwind=# set enable_nestloop=off;
SET
yb_demo_northwind=# set yb_enable_batchednl=off;
SET

yb_demo_northwind=# explain (analyze, costs off)
 select * from customers
 join orders using(customer_id)
where city='London'
order by customer_id;
                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Merge Join (actual time=5.839..6.334 rows=46 loops=1)
   Merge Cond: (customers.customer_id = orders.customer_id)
   ->  Index Scan using customers_city on customers (actual time=1.961..1.983 rows=6 loops=1)
         Index Cond: ((city)::text = 'London'::text)
   ->  Sort (actual time=3.851..3.897 rows=679 loops=1)
         Sort Key: orders.customer_id
         Sort Method: quicksort  Memory: 150kB
         ->  Seq Scan on orders (actual time=1.992..2.717 rows=830 loops=1)
 Planning Time: 0.306 ms
 Execution Time: 6.465 ms
 Peak Memory Usage: 373 kB
(11 rows)
Enter fullscreen mode Exit fullscreen mode

Finally if I reset all parameters but analyze the tables, The Batched Nested Loop is chosen by the query planner:

yb_demo_northwind=# \c
You are now connected to database "yb_demo_northwind" as user "yugabyte".

yb_demo_northwind=# analyze;
WARNING:  'analyze' is a beta feature!
ANALYZE

                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 YB Batched Nested Loop Join (actual time=7.830..7.851 rows=46 loops=1)
   Join Filter: (customers.customer_id = orders.customer_id)
   Sort Keys: customers.customer_id
   ->  Index Scan using customers_city on customers (actual time=1.058..1.066 rows=6 loops=1)
         Index Cond: ((city)::text = 'London'::text)
   ->  Index Scan using orders_customer_id on orders (actual time=6.150..6.208 rows=46 loops=1)
         Index Cond: (customer_id = ANY (ARRAY[customers.customer_id, $1, $2, ..., $1023]))
 Planning Time: 17.507 ms
 Execution Time: 8.226 ms
 Peak Memory Usage: 602 kB
(10 rows)
Enter fullscreen mode Exit fullscreen mode

Letting the query planner choose the join method is probably the best, especially if you also set the Cost Based Optimizer with yb_enable_optimizer_statistics and yb_enable_base_scans_cost_model. But if you need to disable batching for a specific join, you can use YBbatchedNL

yb_demo_northwind=# explain (analyze, costs off)
 /*+ NoYBbatchedNL(customers orders) */
 select * from customers
 join orders using(customer_id)
where city='London'
order by customer_id;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Nested Loop (actual time=2.573..7.722 rows=46 loops=1)
   ->  Index Scan using customers_city on customers (actual time=1.548..1.558 rows=6 loops=1)
         Index Cond: ((city)::text = 'London'::text)
   ->  Index Scan using orders_customer_id on orders (actual time=0.989..0.999 rows=8 loops=6)
         Index Cond: (customer_id = customers.customer_id)
 Planning Time: 0.433 ms
 Execution Time: 7.818 ms
 Peak Memory Usage: 56 kB
(8 rows)
Enter fullscreen mode Exit fullscreen mode
💖 💪 🙅 🚩
franckpachot
Franck Pachot

Posted on January 1, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related