GBase 8a Implementation Guide: Parameter Optimization

congcong

Cong Li

Posted on June 28, 2024

GBase 8a Implementation Guide: Parameter Optimization

1. Overview

Optimizing the configuration of GBase 8a clusters varies across different application scenarios. A single set of fixed parameters cannot meet all situations. This chapter provides general principles for parameter tuning based on accumulated optimization experiences in various projects and business scenarios.

2. Thread Pool and Parallelism Parameters

2.1. Increasing Thread Stack Size to Avoid Crashes

thread_stack

When dealing with large SQL statements, a small thread stack (thread_stack) can cause stack overflow and crashes. It's necessary to configure the thread stack size for both gcluster and gnode:

  • Recommended gcluster thread_stack: 4M (default is 4M in version 953)
  • Recommended gnode thread_stack: 512K (default is 256K). If stack overflow continues, increase this parameter further.

In the configuration files for gcluster and gnode, add thread_stack = 524288 and restart the services after configuration.

2.2. Common Parallelism Parameter Adjustments to Enhance Performance

gcluser_max_thread_in_pool

This parameter defines the number of connection threads in the global connection thread pool of the cluster layer connecting nodes (gnode). The default value is 600.

Generally, this default value suffices as the global thread pool primarily uses asynchronous API logic. However, for larger clusters with high SQL concurrency, if the thread pool's thread count is frequently exhausted, adjust gcluser_max_thread_in_pool. It’s advisable not to exceed 2000, as too large a value may degrade performance.

gbase_parallel_degree

This parameter configures the parallelism level for a single SQL execution. The default value is 0, recommended to set between 4 and 8.

  • A value of 0 uses half the CPU cores' count for parallelism of a single SQL task.
  • If the value exceeds the thread pool count, it uses the thread pool count.

For example, with gbase_parallel_degree=0 and 32 CPU cores, the parallelism for a single SQL task would be 16.

gbase_loader_parallel_degree

This parameter sets the number of threads for each load task on gnode. The default is 0, using half the CPU cores' count. Recommended value is between 4 and 8.

  • A value of 0 uses half the CPU cores' count.
  • If the value exceeds the thread pool count, it uses the thread pool count.

gbase_parallel_max_thread_in_pool

This parameter specifies the total number of threads in the thread pool, defaulting to twice the number of CPU cores. When deploying a single gnode node per server, the recommended value is 4-8 times the number of CPU cores.

Each task fetches threads from the pool for execution. If the pool lacks sufficient idle threads, new threads are created for serial execution.

thread_pool_size

This parameter indicates the number of thread pools for receiving client connections, impacting throughput.

2.3. Controlling the Number of SQL Tasks Issued by gcluster

gcluster_serial_exec_query

The gcluster_serial_exec_query parameter, defaulting to 0, specifies the number of SQL tasks gcluster can issue when gcluster_enable_serial_load is enabled. It primarily controls the issuance of select and insert select tasks. Resource management also controls the issuance of SQL tasks.

gcluster_enable_serial_load

The gcluster_enable_serial_load parameter is a switch that, when enabled (set to 1), controls the number of SQL tasks issued by gcluster. When enabled, gcluster_serial_exec_query takes effect. If disabled (default value 0), it has no control.

3. Load Parameters

3.1. Parameters to Enhance Loading Performance

gcluster_loader_max_data_processors

This parameter defines the maximum number of loading machines for a single data loading task. The default value is 16, with a recommended range of 4-8.

When using SFTP for loading, pay attention to SSH connection limits. Excessive loading machine connections might exceed the allowed maximum connections, causing load failures. Adjust SSHD configuration files to modify maxstartup, maxsession, and other parameters.

3.2. Load Task Load Balancing Settings

gcluster_load_rebalance_seed

This parameter specifies the maximum difference in the number of loading tasks on different data nodes before initiating node load task balancing. The default value is 5. For better balance, set it to 2 or 1.

For instance, if five concurrent load tasks all select the 10th node as the loading machine, this node's task count is 5, while some nodes might have 0 tasks, resulting in a maximum task count difference of 5. When the 6th load task is issued, nodes with a task difference less than 5 are selected as loading machines.

3.3. Setting the Maximum Length of a Single Load Line

gbase_loader_max_line_length

This parameter specifies the maximum allowed length of a single line in the data loading file, defaulting to 4M.

GBase 8a can load data lines up to 4M. If any line exceeds this size, the load task fails with the error “Line length xxxx is more than gbase_loader_max_line_length(4194304)”. Adjust this parameter to a larger value to succeed in loading tasks, skipping lines exceeding 4M as error data.

3.4. Setting Load Read Timeout

gbase_loader_read_timeout

This parameter specifies the timeout for reading FTP/HTTP/SFTP files. If filling a data block (8M) exceeds this value, the load task stops with an error. A value of 0 means no timeout. The default is 300 seconds.

In poor network environments, increase this parameter to avoid read operation timeout errors.

4. Data Transfer Parameters

4.1. Adjusting Result Set Transfer Parameters to Avoid Write File Failures

gcluster_send_client_data_timeout

This parameter specifies the timeout for the cluster to send result set data to the client. The range is [0,100000] seconds, with a default of 30 seconds. A value of 0 means no timeout, with infinite waiting until success or failure.

In poor network conditions or when fetching result sets through JDBC with additional processing logic, increasing this parameter can prevent connection failures (error “com.gbase.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure”).

4.2. Setting for Large Data Inserts with insert values

max_allowed_packet

For large data inserts, such as inserting tens of thousands of rows at once, increase the max_allowed_packet parameter.

Configure it using set global max_allowed_packet = 134217728.

Note: The max_allowed_packet parameter must be written under [gbased] and above [gbasedump] in the configuration file.

4.3. Inter-node Data Transfer Parameters

_gbase_sis_ctq_threads

This parameter specifies the number of threads for data processing on the receiving end during insert select or load operations. The default value is 0, using the number of CPU cores. If insufficient, temporary threads are created. Use show engine express status to view node connections, session status, and task sending and receiving situations.

The _gbase_sis_ctq_threads parameter sets the number of CTQs (CommonTaskQueue). Each CTQ corresponds to an executor, with the default being twice the number of CPU cores. A recommended setting is 512.

In large-scale cluster concurrent scenarios, insufficient CTQs can lead to performance issues due to repeated CTQ checks and temporary executor creation.

_gbase_rep_receive_buffer_size

This parameter specifies the maximum waiting data size for node-to-node data transfers during insert and load operations. _gbase_gns_queue_size indicates the message queue length for each sending session. Increase this parameter if GNS data transfer performance issues occur.

5. Parameters for Troubleshooting

5.1. Intermediate Temporary Result Retention

gcluster_executor_debug

When enabled, this parameter retains temporary tables during SQL execution (located under the gctmpdb database). This is used for troubleshooting and analysis. The default is disabled.

5.2. Temporary File Storage Path Configuration

tmpdir

This parameter specifies the storage path for temporary files, defaulting to /tmp. Adjust it to an independent directory under the installation path, such as /opt/tmp_gcluster and /opt/tmp_gnode.

Note: Ensure the modified tmpdir directory exists and has read/write permissions for the DBA user gbase. Avoid placing it under $GCLUSTER_BASE or $GBASE_BASE directories, as it can cause node replacement failures. Before node replacement, create the tmpdir directory on the new machine and grant read/write permissions to the gbase user.

💖 💪 🙅 🚩
congcong
Cong Li

Posted on June 28, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related