GBase 8a MPP Cluster Multi-Instance Best Practices

congcong

Cong Li

Posted on July 4, 2024

GBase 8a MPP Cluster Multi-Instance Best Practices

1 Overview

1.1 Overview

Deploying GBase 8a MPP Cluster on high-configuration servers and NUMA architecture servers (Non-Uniform Memory Access, or NUMA) often results in underutilization of hardware resources when only one database instance is deployed per server. For example:

  • When the server memory exceeds 300GB, a single database instance struggles to utilize all the memory effectively.
  • When the server CPU has more than 40 logical cores, a single database instance cannot achieve linear performance improvement with an increase in cores.
  • When using NUMA architecture servers with multiple NUMA nodes, frequent cross-node memory access by a single database instance leads to suboptimal performance.
  • A single database instance cannot fully leverage the capabilities of new hardware such as SSD/NVME.

The GBase 8a MPP Cluster V9.5.3 version officially supports multi-instance deployment. Deploying multiple database instances on a single server can address these performance utilization issues and improve cluster performance. According to actual tests, adopting a multi-instance deployment on NUMA architecture and high-configuration servers can significantly enhance cluster performance, with an improvement of more than 1x compared to single-instance deployment.

This document introduces the installation process, configuration recommendations, and management methods for GBase 8a MPP Cluster V9.5.3 in multi-instance deployment scenarios.

1.2 Terminology

The terms used in this document are explained as follows:

Term/Definition Meaning
Multi-instance Deployment of multiple database instances, also known as multi-instance deployment, refers to deploying multiple data cluster nodes on a single physical server. Each data cluster node is referred to as a database instance.
NUMA Non-Uniform Memory Access
gcware node Node managing the cluster, used for sharing information between GClusters
gcluster node Node scheduling the cluster, responsible for SQL parsing, SQL optimization, distributed execution plan generation, and execution scheduling.
data node Node of the data cluster, also known as gnode, consisting of data nodes, serving as the storage and computation unit of the data.

2 Multi-Instance Installation Deployment

2.1 Deployment Plan

GBase 8a MPP Cluster installs multiple data nodes on each server. Each data node needs to be configured with a unique IP address, and nodes are distinguished by these IP addresses. A maximum of one gcluster node and one gcware node can be installed per physical server:

Image description

Before deploying the cluster, the following tasks need to be planned and completed:

1) Evaluate and determine the number of instances to be deployed on each server:

  • Based on the number of NUMA nodes, memory size, cluster size, and business scenarios (load), evaluate the number of database instances to be deployed on each server. It is generally recommended to deploy no more than 4 database instances on a physical server, with each instance having at least 64GB of available memory.

2) IP address resource planning:

  • Apply for an IP address for each database instance for internal communication within the cluster. It is recommended to configure multiple network cards on the physical server and bind them in load-balancing mode.

3) Disk planning:

  • Different database instances should use different disk groups for disk I/O isolation. For example, configure a RAID5 disk group for each database instance.

4) Determine cluster architecture:

  • It is recommended to have an odd number of gcware nodes and gcluster nodes, with a maximum of one gcware node and one gcluster node per physical server. It is suggested to deploy gcware and gcluster nodes on a single NUMA node, separate from data nodes.

5) Ensure server and OS environment meet GBase 8a cluster installation requirements:

  • Refer to the GBase 8a MPP Cluster product manual.

2.2 Cluster Installation

Example server IPs for installation:

  • Server 1:

    • IP1: 192.168.146.20
    • IP2: 192.168.146.40
  • Server 2:

    • IP3: 192.168.146.21
    • IP4: 192.168.146.41

Current Cluster Version Limitations

  1. The installation package only checks for RedHat, SUSE, and CentOS systems. Other systems need manual adjustments to bypass related checks.
  2. Supports Python versions 2.6 and 2.7, not Python 3.
  3. A physical machine can have only one coordinator and one gcware, and they must share the same IP.

2.2.1 Configure Multiple IPs

For servers with multiple 10Gbps network cards, bind multiple network cards to ensure high network availability and maximum bandwidth.

For each database instance, configure an IP address on a single network card or bound network cards. Example configuration:

vim /etc/sysconfig/network-scripts/ifcfg-p6p2
Enter fullscreen mode Exit fullscreen mode

Image description

The blue parts above are the added content, IPADDR1 is the first virtual IP address, and NETMASK1 is the subnet mask for the first virtual IP. Follow this pattern for subsequent virtual IPs. NETMASK must match the NETMASK of the physical IP. If this parameter is not added, the subnet allocated for the virtual IP might differ from the physical IP.

  • When multiple network cards are not bound, configure one network card for each instance, as shown in the example below:

Image description

Restart the network service to apply changes:

service network restart
# or
systemctl restart network
Enter fullscreen mode Exit fullscreen mode

2.2.2 Prepare for Installation

Follow the GBase 8a cluster installation steps as outlined in the product manual. Before formal installation:

  • Create a gbase user on each server:
  useradd gbase
  passwd gbase
Enter fullscreen mode Exit fullscreen mode
  • Upload and extract the installation files:
  tar xjf GBase8a_MPP_Cluster-NoLicense-9.5.3.17-redhat7.3-x86_64.tar.bz2
  chown -R gbase:gbase gcinstall
Enter fullscreen mode Exit fullscreen mode
  • Copy and execute SetSysEnv.py on all servers to configure environment variables:
  scp SetSysEnv.py root@192.168.146.21:/opt
  python SetSysEnv.py --installPrefix=/opt --dbaUser=gbase
Enter fullscreen mode Exit fullscreen mode
  • Adjust the permissions of the installation path to allow the gbase user to write:
  drwxr-x---. 6 gbase gbase 157 Jan 28 18:59 opt
Enter fullscreen mode Exit fullscreen mode
  • Modify the installation configuration file demo.options:

Image description

2.2.3 Execute Installation

As the gbase user, run the installation:

python gcinstall.py --silent=demo.options
Enter fullscreen mode Exit fullscreen mode

2.2.4 Obtain License

If installing the no-license version, skip this section.

1) Fingerprint Collection:
Use any instance IP from the multi-instance server to obtain the fingerprint:

   ./gethostsid -n 192.168.146.20,192.168.146.21 -u gbase -p gbase -f hostsfingers.txt
Enter fullscreen mode Exit fullscreen mode

2) Generate License:
Email the hostsfingers.txt file to the vendor to obtain the license.

3) Import License:
Import the license to all instances:

   ./License -n 192.168.146.20,192.168.146.21,192.168.146.40,192.168.146.41 -u gbase -p gbase -f gbase.lic
Enter fullscreen mode Exit fullscreen mode

2.2.5 Cluster Initialization

Configure distribution and execute initialization as per the product manual.

Image description

gcadmin createvc vc.xml
gcadmin distribution gcChangeInfo.xml p 1 d 1 vc vc1
Enter fullscreen mode Exit fullscreen mode

Image description

Ensure primary and standby data slices are on different physical servers for data high availability.

2.3 NUMA Binding

For high-configuration servers with NUMA architecture, it is recommended to evenly allocate NUMA nodes to different GBase instances. For example, on a server with 8 NUMA nodes running two GBase instances, bind 4 NUMA nodes to each instance.

2.3.1 View NUMA Groups

For servers with NUMA architecture, you need to install numactl in advance on each server as follows:

Image description

Note: This feature must be installed as it is required for starting services.

Use the numastat command to view the NUMA groups of the server. The following examples show configurations for 4 NUMA nodes and 8 NUMA nodes. Depending on the number of NUMA nodes, you can allocate 1 NUMA node per instance (IP), 2 NUMA nodes per instance (IP), or 4 NUMA nodes per instance (IP), etc.

4 NUMA nodes:

Image description

8 NUMA nodes:

Image description

2.3.2 Bind GBase 8a Instances to NUMA Nodes

For servers that deploy multiple data nodes (GBase instances) only, it's recommended to evenly distribute the server's CPUs and memory across NUMA nodes based on the number of data nodes. For servers that deploy both data nodes (GBase instances) and gcluster/gcware nodes, it's advisable to deploy gcluster and gcware nodes on the same NUMA node.

The binding relationship between GBase 8a instances and NUMA nodes is configured through adjustments to the gcluster_services script. After installing multiple instances, because the cluster service startup command gcluster_services points to the gcluster_services script under any instance (including gnode and gcluster instances), you can specify the gcluster_services file of a particular instance to add binding commands. For example, modify the gcluster_services file under IP/gnode/server/bin to add bindings.

There are two methods to start the database service thereafter:

Method 1: Use the modified gcluster_services file every time you start the database service.

cd IP/gnode/server/bin
./gcluster_services all start
Enter fullscreen mode Exit fullscreen mode

Method 2: Copy the modified gcluster_services file to replace the gcluster_services files under all instances (IP/gnode/server/bin/gcluster_services and IP/gcluster/server/bin/gcluster_services). Subsequently, use the regular cluster startup command:

gcluster_services all start
Enter fullscreen mode Exit fullscreen mode

Example: For a server with 8 NUMA nodes, where 1 instance is bound to 4 NUMA nodes:

Choose any instance's gnode/server/bin/gcluster_services file and modify two sections:

1) Original section around line 410:

Image description

Modify to:

Image description

Note:

  • numactl --membind=nodes program (nodes specify the nodes to allocate, e.g., 0, 1, or other node numbers; program can be an absolute path or a service startup script)
  • numactl --cpunodebind=nodes program (nodes specify CPU nodes, followed by the program)

2) Original section around line 500:
Execute code of gcluster_services all start

Image description

Modify to:

Image description

3) Original section around line 450:
Execute code of gcluster_services gbase|syncserver start

Image description

Modify to:

Image description

After making these changes to the files above, restart the cluster service:

cd IP/gnode/server/bin
./gcluster_services all start
Enter fullscreen mode Exit fullscreen mode

You can verify the NUMA binding effect using the following command:

numastat `pidof gbased`
Enter fullscreen mode Exit fullscreen mode

For example, to view the effect of binding 1 NUMA node per instance in a scenario with 2 instances and 2 NUMA nodes per instance:

[root@pst_w61 config]$ numastat `pidof gbased`
Enter fullscreen mode Exit fullscreen mode

Image description

💖 💪 🙅 🚩
congcong
Cong Li

Posted on July 4, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related