GBase 8a Implementation Guide: Parameter Optimization
Cong Li
Posted on June 28, 2024
1. Overview
Optimizing the configuration of GBase 8a clusters varies across different application scenarios. A single set of fixed parameters cannot meet all situations. This chapter provides general principles for parameter tuning based on accumulated optimization experiences in various projects and business scenarios.
2. Thread Pool and Parallelism Parameters
2.1. Increasing Thread Stack Size to Avoid Crashes
thread_stack
When dealing with large SQL statements, a small thread stack (thread_stack
) can cause stack overflow and crashes. It's necessary to configure the thread stack size for both gcluster
and gnode
:
- Recommended
gcluster
thread_stack: 4M (default is 4M in version 953) - Recommended
gnode
thread_stack: 512K (default is 256K). If stack overflow continues, increase this parameter further.
In the configuration files for gcluster
and gnode
, add thread_stack = 524288
and restart the services after configuration.
2.2. Common Parallelism Parameter Adjustments to Enhance Performance
gcluser_max_thread_in_pool
This parameter defines the number of connection threads in the global connection thread pool of the cluster layer connecting nodes (gnode
). The default value is 600.
Generally, this default value suffices as the global thread pool primarily uses asynchronous API logic. However, for larger clusters with high SQL concurrency, if the thread pool's thread count is frequently exhausted, adjust gcluser_max_thread_in_pool
. It’s advisable not to exceed 2000, as too large a value may degrade performance.
gbase_parallel_degree
This parameter configures the parallelism level for a single SQL execution. The default value is 0, recommended to set between 4 and 8.
- A value of 0 uses half the CPU cores' count for parallelism of a single SQL task.
- If the value exceeds the thread pool count, it uses the thread pool count.
For example, with gbase_parallel_degree=0
and 32 CPU cores, the parallelism for a single SQL task would be 16.
gbase_loader_parallel_degree
This parameter sets the number of threads for each load task on gnode
. The default is 0, using half the CPU cores' count. Recommended value is between 4 and 8.
- A value of 0 uses half the CPU cores' count.
- If the value exceeds the thread pool count, it uses the thread pool count.
gbase_parallel_max_thread_in_pool
This parameter specifies the total number of threads in the thread pool, defaulting to twice the number of CPU cores. When deploying a single gnode
node per server, the recommended value is 4-8 times the number of CPU cores.
Each task fetches threads from the pool for execution. If the pool lacks sufficient idle threads, new threads are created for serial execution.
thread_pool_size
This parameter indicates the number of thread pools for receiving client connections, impacting throughput.
2.3. Controlling the Number of SQL Tasks Issued by gcluster
gcluster_serial_exec_query
The gcluster_serial_exec_query
parameter, defaulting to 0, specifies the number of SQL tasks gcluster
can issue when gcluster_enable_serial_load
is enabled. It primarily controls the issuance of select
and insert select
tasks. Resource management also controls the issuance of SQL tasks.
gcluster_enable_serial_load
The gcluster_enable_serial_load
parameter is a switch that, when enabled (set to 1), controls the number of SQL tasks issued by gcluster
. When enabled, gcluster_serial_exec_query
takes effect. If disabled (default value 0), it has no control.
3. Load Parameters
3.1. Parameters to Enhance Loading Performance
gcluster_loader_max_data_processors
This parameter defines the maximum number of loading machines for a single data loading task. The default value is 16, with a recommended range of 4-8.
When using SFTP for loading, pay attention to SSH connection limits. Excessive loading machine connections might exceed the allowed maximum connections, causing load failures. Adjust SSHD configuration files to modify maxstartup
, maxsession
, and other parameters.
3.2. Load Task Load Balancing Settings
gcluster_load_rebalance_seed
This parameter specifies the maximum difference in the number of loading tasks on different data nodes before initiating node load task balancing. The default value is 5. For better balance, set it to 2 or 1.
For instance, if five concurrent load tasks all select the 10th node as the loading machine, this node's task count is 5, while some nodes might have 0 tasks, resulting in a maximum task count difference of 5. When the 6th load task is issued, nodes with a task difference less than 5 are selected as loading machines.
3.3. Setting the Maximum Length of a Single Load Line
gbase_loader_max_line_length
This parameter specifies the maximum allowed length of a single line in the data loading file, defaulting to 4M.
GBase 8a can load data lines up to 4M. If any line exceeds this size, the load task fails with the error “Line length xxxx is more than gbase_loader_max_line_length(4194304)”. Adjust this parameter to a larger value to succeed in loading tasks, skipping lines exceeding 4M as error data.
3.4. Setting Load Read Timeout
gbase_loader_read_timeout
This parameter specifies the timeout for reading FTP/HTTP/SFTP files. If filling a data block (8M) exceeds this value, the load task stops with an error. A value of 0 means no timeout. The default is 300 seconds.
In poor network environments, increase this parameter to avoid read operation timeout errors.
4. Data Transfer Parameters
4.1. Adjusting Result Set Transfer Parameters to Avoid Write File Failures
gcluster_send_client_data_timeout
This parameter specifies the timeout for the cluster to send result set data to the client. The range is [0,100000] seconds, with a default of 30 seconds. A value of 0 means no timeout, with infinite waiting until success or failure.
In poor network conditions or when fetching result sets through JDBC with additional processing logic, increasing this parameter can prevent connection failures (error “com.gbase.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure”).
4.2. Setting for Large Data Inserts with insert values
max_allowed_packet
For large data inserts, such as inserting tens of thousands of rows at once, increase the max_allowed_packet
parameter.
Configure it using set global max_allowed_packet = 134217728
.
Note: The max_allowed_packet
parameter must be written under [gbased]
and above [gbasedump]
in the configuration file.
4.3. Inter-node Data Transfer Parameters
_gbase_sis_ctq_threads
This parameter specifies the number of threads for data processing on the receiving end during insert select
or load operations. The default value is 0, using the number of CPU cores. If insufficient, temporary threads are created. Use show engine express status
to view node connections, session status, and task sending and receiving situations.
The _gbase_sis_ctq_threads
parameter sets the number of CTQs (CommonTaskQueue). Each CTQ corresponds to an executor, with the default being twice the number of CPU cores. A recommended setting is 512.
In large-scale cluster concurrent scenarios, insufficient CTQs can lead to performance issues due to repeated CTQ checks and temporary executor creation.
_gbase_rep_receive_buffer_size
This parameter specifies the maximum waiting data size for node-to-node data transfers during insert
and load operations. _gbase_gns_queue_size
indicates the message queue length for each sending session. Increase this parameter if GNS data transfer performance issues occur.
5. Parameters for Troubleshooting
5.1. Intermediate Temporary Result Retention
gcluster_executor_debug
When enabled, this parameter retains temporary tables during SQL execution (located under the gctmpdb
database). This is used for troubleshooting and analysis. The default is disabled.
5.2. Temporary File Storage Path Configuration
tmpdir
This parameter specifies the storage path for temporary files, defaulting to /tmp
. Adjust it to an independent directory under the installation path, such as /opt/tmp_gcluster
and /opt/tmp_gnode
.
Note: Ensure the modified tmpdir
directory exists and has read/write permissions for the DBA user gbase
. Avoid placing it under $GCLUSTER_BASE
or $GBASE_BASE
directories, as it can cause node replacement failures. Before node replacement, create the tmpdir
directory on the new machine and grant read/write permissions to the gbase
user.
Posted on June 28, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 29, 2024