GBASE数据库 | GBase 8a Distributed Execution Plan Principles

GBase Database (GBase 数据库) is a high-performance relational database widely used in data analysis and performance optimization scenarios.

1. Overview

GBase 8a MPP Cluster is a distributed parallel database cluster built on a Shared Nothing architecture. It offers high performance, high availability, and high scalability. Leveraging an MPP + Shared Nothing design, the cluster performs queries in a distributed parallel manner, decomposing each query into parallel tasks executed across nodes. This article provides an in-depth look at the distributed implementation principles of the frequently used join and group by operators within queries, along with key parameters in GBase 8a that affect the distributed execution plan, including their mechanisms and influence on execution.

2. Principles of GBase 8a Distributed Execution Plan

2.1 Data Distribution in GBase 8a

Tables in GBase 8a appear as logical tables to users, similar to those in standalone databases. The actual storage of tables supports two types of data distribution:

Distributed Table: Data within a table is distributed across nodes in the cluster, with each node storing a data fragment, or physical table, representing a portion of the logical table. GBase 8a supports two distribution strategies:
- Hash Distribution: Uses a specified column(s) defined by distributed by to calculate a hash value that determines the node where data resides. Both single and multiple columns can be used for hash distribution.
- Random Distribution: Data is distributed randomly across nodes.
Replicated Table: Each node contains an identical copy of the complete data.

2.2 Hash Distribution Principles

In GBase 8a, hash distribution operates as follows, given a table definition:

create table emp (pname varchar(50), position varchar(50), ...) distributed by('pname');

Here, pname serves as the hash distribution column, meaning its values are used to calculate the hash value.

The hash function in GBase 8a is f(x) = crc32(x) % 65536, where the hash value is derived from crc32 and modulated by 65536, distributing values across a range from 0 to 65535. The relationship between hash values and data shards (physical tables) is stored in the system table gbase.nodedatamap, where the hashkey column maps the 65536 hash values, and nodeid denotes the physical table number (1 to N) associated with each value. Shard-to-node binding is established using the gcadmin distribution command and can be verified with gcadmin showdistribution.

2.3 Distributed Join Execution Plan Principles

This section describes the implementation principles for distributed join operations, specifically for equality-based joins (excluding range joins such as between or inequality comparisons).

There are two basic models for distributed joins:

1) Distributed Table Join Distributed Table: Both tables have matching data distribution, meaning the equality condition includes the hash distribution column of each table.

2) Distributed Table Join Replicated Table: One table is a distributed table, and the other is a replicated table (typically with the right table being the replicated table in a left join).

In these models, the join can be executed in parallel across nodes. If neither model is met, data redistribution is performed to transform the data distribution and allow parallel join calculations based on these models.

It can be divided into the following three situations:

Small Table Broadcast Join Execution Plan
Single Table Redistribution Join Execution Plan
Dual Table Redistribution Join Execution Plan

2.3.1 Small Table Broadcast Join Execution Plan

This execution plan is used when:

Both tables are distributed tables.
Table T1 is a large table, and T2 is a small table.
The join condition does not include an equality condition for both tables’ distribution columns.

Execution:
1) Table T2 (small table) is broadcast across nodes to create a temporary replicated table tmp2.
2) This transformation allows the join to follow Model 2 for distributed table join replicated table execution.

2.3.2 Single Table Redistribution Join Execution Plan

This execution plan is used when:

Both tables are distributed tables.
The join condition includes T1's distribution column equaling a non-distribution column in T2.

Execution:
1) Table T2 is redistributed based on the join equality condition, creating a distributed table tmp2.
2) The join follows Model 1 for distributed table join distributed table execution.

2.3.3 Dual Table Redistribution Join Execution Plan

This execution plan is used when:

Both tables are distributed tables.
The join condition includes no distribution columns from either table.

Execution:
1) Redistribute T1 based on the equality condition columns to create a temporary table tmp1.
2) Redistribute T2 similarly to create a temporary table tmp2.
3) Transform to Model 1 for distributed table join distributed table execution.

Note: If there are multiple equality conditions, the last condition in the list is used for redistribution.

2.4 Distributed Group By Execution Plan Principles

Like joins, distributed group by also has basic models that allow for straightforward parallel execution on nodes. The two models are:

1) Model 1:

The group by table is a hash-distributed table.
The group by columns include the table’s hash distribution column.
Execution: Each node performs group by in parallel, with results merged and sorted at the coordinator node.

2) Model 2:

The group by table is a replicated table.
Execution: A single node executes the group by operation and sends the result to the coordinator node.

If these conditions are not met, data redistribution is used to fit the data into the simplest models, with two derived execution plans:

Redistribution Group By Execution Plan
Two-Phase Group By Execution Plan

2.4.1 Redistribution Group By Execution Plan

Conditions:

The group by table is distributed.
The group by columns do not include the table's hash distribution column.

Execution:
1) Perform an initial aggregation on each node to reduce network transfer volume.
2) Redistribute data by one group by column, creating a temporary table tmp1.
3) Follow Model 1 for further parallel execution.

2.4.2 Two-Phase Group By Execution Plan

Conditions:

The group by table is distributed.
The group by columns do not include the table's hash distribution column.

Execution:
1) Each node performs an initial aggregation on the group by columns.
2) The aggregated data is sent to a single node for further aggregation and sorting.

3. Execution Plan Parameters

3.1 Key Parameters

The specific execution plan selected during SQL execution is influenced by certain parameters:

gcluster_hash_redistribute_join_optimize: Controls dynamic redistribution in equality joins.
- Values: 0 (Small table broadcast), 1 (Dynamic redistribution), 2 (Auto-evaluate)
- Default: 2
gcluster_hash_redist_threshold_row: Threshold for automatic redistribution when gcluster_hash_redistribute_join_optimize is set to 2.
- Values: 0 (No limit), positive integers as row count threshold.
- Default: 0
gcluster_hash_redistribute_groupby_optimize: Controls dynamic redistribution for group by.
- Values: 0 (Disabled), 1 (Dynamic redistribution)
- Default: 1

3.2 Parameter Influence on Execution Plans

These parameters impact whether the optimizer uses specific redistribution plans to execute queries based on the table and condition characteristics.

Join Scenario	gcluster_hash_redistribute_join_optimize Parameter 1	gcluster_hash_redist_threshold_row Parameter 2	Execution Plan
Both tables are distributed tables, with join condition on hash distribution columns of both tables	Parameter 1 = 2 or = 1	No effect	Join operation is executed locally following Execution Plan Model 1
One table is a distributed table, and the other is a replicated table	No effect	No effect	Join operation is executed locally following Execution Plan Model 2
Both tables are distributed tables, with join condition on the hash distribution column of only one table	Parameter 1 = 2: Execution plan depends on Parameter 2 = 1: Follow Execution Plan 1 = 0: Follow Execution Plan 2	If Parameter 1 = 2, evaluate based on the value of Parameter 2 as follows: IF the row count of the smaller table < gcluster_hash_redist_threshold_row AND the row count difference between the large and small tables exceeds 20% THEN follow Execution Plan 2 ELSE follow Execution Plan 1 END	1) Redistribute the hash-distributed table by the join column (non-join column); follow the Single Table Redistribution Hash Join plan. 2) Convert the smaller table to a replicated table; follow the Small Table Replication Plan.
Both tables are distributed tables, with join condition not on hash distribution columns of either table	= 2: Execution plan depends on Parameter 2 = 1: Follow Execution Plan 1 = 0: Follow Execution Plan 2	If Parameter 1 = 2, evaluate based on the value of Parameter 2 as follows: IF the row count of the smaller table < gcluster_hash_redist_threshold_row AND the row count difference between the large and small tables exceeds 20% THEN follow Execution Plan 2 ELSE follow Execution Plan 1 END	1) Redistribute both hash-distributed tables by the join column; follow the Two-Table Redistribution Hash Join plan. 2) Convert the smaller table to a replicated table; follow the Small Table Replication Plan.

Group By Scenario	Parameter: gcluster_hash_redistribute_groupby_optimize	Execution Plan
Group by list contains the hash distribution column, without functions or expressions on the hash column	= 1	Perform group by operation locally, following Group By Basic Model 1
Any other case	= 1: Follow Execution Plan 1 = 0: Follow Execution Plan 2	Execution Plan 1: Redistribute the table by the first column in the group by list (function or expression), following the Redistribution Group By plan

The above covers some practical tips with GBase Database (GBase 数据库). If you're interested in learning more about GBase Database, feel free to follow our related technical articles.