GBase 8c Streaming Replication Parameter Configuration

congcong

Cong Li

Posted on July 23, 2024

GBase 8c Streaming Replication Parameter Configuration

GBase 8c uses streaming replication for data synchronization between primary and standby nodes in both primary-standby and distributed deployment modes. Streaming replication, introduced in PostgreSQL 9.0, transmits WAL logs from the primary to the standby as soon as they are generated.

Streaming replication consists of two main phases: instance recovery and primary-standby synchronization.

  • Instance Recovery Phase
    When the pg database crashes, upon restart, it replays all WAL logs from the last checkpoint before the crash to restore the database to a consistent state. During the setup of the standby database, the primary database usually remains online, causing the backup to be in an inconsistent state. When the standby starts, the startup process performs instance recovery.

  • Primary-Standby Synchronization Phase
    The wal receiver process sends the standby's latest LSN to the primary. The wal sender then sends the WAL logs from the primary's latest LSN to the standby's latest LSN. The wal receiver receives the WAL logs and writes them to disk, while the startup process replays the WAL logs on the standby.

Below are the main parameters involved in GBase 8c streaming replication:

1. Sending Server Parameters

max_wal_senders

  • Description: Specifies the maximum number of concurrent connections for transaction log sending processes. Should not be greater than max_connections.
  • Parameter Type: POSTMASTER
  • Note: wal_level must be set to archive or hot_standby to allow standby connections.
  • Range: Integer, 0-1024
  • Suggested Range: 8-100
  • Default: 16

wal_keep_segments

  • Description: Number of Xlog segments to keep. Sets the minimum number of transaction log files to retain in the pg_xlog directory for streaming replication.
  • Parameter Type: SIGHUP
  • Range: Integer, 2 - INT_MAX
  • Default: 16
  • Recommendations:
    • May need to be increased for large data imports (e.g., 50G data may require a setting of 1000).
    • Should be increased when synchronous_commit level is below LOCAL_FLUSH to avoid rebuilding failures.

wal_sender_timeout

  • Description: Maximum wait time for the transaction log receiver to receive logs.
  • Parameter Type: SIGHUP
  • Range: Integer, 0 - INT_MAX (ms)
  • Default: 6s

logical_sender_timeout

  • Description: Maximum wait time for the logical log receiver to receive logs.
  • Parameter Type: USERSET
  • Range: Integer, 0 - INT_MAX (ms)
  • Default: 30s

max_replication_slots

  • Description: Sets the number of replication slots on the primary server.
  • Parameter Type: POSTMASTER
  • Range: Integer, 0-1024
  • Default: 8

enable_slot_log

  • Description: Enables logical replication slot synchronization between primary and standby.
  • Parameter Type: USERSET
  • Range: Boolean
  • Default: off

max_changes_in_memory

  • Description: Maximum memory size for caching a single transaction during logical decoding.
  • Parameter Type: POSTMASTER
  • Range: Integer, 1-2147483647
  • Default: 4096

max_cached_tuplebufs

  • Description: Maximum memory size for caching total tuple information during logical decoding.
  • Parameter Type: POSTMASTER
  • Range: Integer, 1-2147483647
  • Default: 8192

logical_decode_options_default

  • Description: Specifies the global default values for logical decoding options when not explicitly specified.
  • Parameter Type: SIGHUP
  • Range: Comma-separated key=value strings
  • Default: ""

enable_wal_shipping_compression

  • Description: Enables WAL log compression in streaming disaster recovery mode.
  • Parameter Type: SIGHUP
  • Range: Boolean
  • Default: false

repl_auth_mode

  • Description: Sets the authentication mode for primary-standby replication and standby rebuilds.
  • Parameter Type: SIGHUP
  • Range: Enum (off, default, uuid)
  • Default: default

replconninfo1 - replconninfo8

  • Description: Sets the information for the first to eighth nodes for listening and authentication.
  • Parameter Type: SIGHUP
  • Range: String
  • Default: ""

cross_cluster_replconninfo1 - cross_cluster_replconninfo8

  • Description: Sets the information for the first to eighth nodes for cross-cluster listening and authentication.
  • Parameter Type: SIGHUP
  • Range: String
  • Default: ""

available_zone

  • Description: Sets the zone information for the local node.
  • Parameter Type: POSTMASTER
  • Range: String
  • Default: ""

max_keep_log_seg

  • Description: Flow control parameter. Limits the number of unparsed physical log files during logical replication.
  • Parameter Type: USERSET
  • Range: Integer, 0 - 2147483647
  • Default: 0

2. Primary Server Parameters

synchronous_standby_names

  • Parameter Description: List of names of potential synchronous standby servers, separated by commas. This parameter belongs to the SIGHUP type.
  • Notes:
  • The current synchronous standby server connected is the first name in the list. If the current synchronous standby server loses connection, it will immediately switch to the next higher priority standby server and put the name of this standby server in the list.
  • The standby server name can be specified by setting the environment variable PGAPPNAME.
  • Value Range: String. When the value is *, it matches any provided synchronous standby server name. The configuration supports the following formats:
    • ANY num_sync(standby_name [, …]) [, ANY num_sync (standby_name [, …])]
    • [FIRST] num_sync (standby_name [, …])
    • standby_name [, …]

Explanation:

  • num_sync is the number of synchronous standby servers that the transaction needs to wait for a response from. standby_name is the name of the standby server. FIRST and ANY specify the strategy for selecting synchronous standby servers from the listed servers.
  • ANY N (node1,node2,…) means that any N host names within the parentheses are selected as the list of synchronous standby server names. For example, ANY 1 (node1,node2) means that either node1 or node2 is selected as the synchronous standby server name.
  • ANY N1 (node1,node2,…), ANY N2 (node3,node4,…) means grouped potential synchronous standby server names. Within the first group of parentheses, any N1 host names are selected as the first group of synchronous standby server names. Within the second group of parentheses, any N2 host names are selected as the second group of synchronous standby server names. At this point, the groups have an AND relationship. Both groups must meet their respective synchronous standby server numbers for the local transaction to be committed.
  • FIRST N (node1,node2,…) means selecting the first N host names in the order they appear as the list of synchronous standby server names. For example, FIRST 1 (node1,node2) means selecting node1 as the synchronous standby server name.
  • node1,node2,… and FIRST 1 (node1,node2,…) have the same meaning.

If you use the gs_guc tool to set this parameter, you need to set it as follows:

gs_guc reload -Z datanode -N @NODE_NAME@ -D @DN_PATH@ -c "synchronous_standby_names='ANY NODE 1(dn_instanceId1, dn_instanceId2)'";
Enter fullscreen mode Exit fullscreen mode

or:

gs_guc reload -Z datanode -N @NODE_NAME@ -D @DN_PATH@ -c "synchronous_standby_names='ANY 1(AZ1, AZ2)'";
Enter fullscreen mode Exit fullscreen mode
  • Default Value: *
  • Notes:
  • Duplicate names should not appear in the standby server name list. The num_sync in the configuration should not be greater than the number of the standby server list.
  • For multi-group synchronous standby server configurations like ANY N1 (node1,node2,…), ANY N2 (node3,node4,…), the groups have an AND relationship. Currently, only multi-ANY grouping is supported. Using * for fuzzy matching or having duplicate standby servers in the configuration is not allowed.

most_available_sync

  • Parameter Description: When there is a synchronous standby failure, the host transactions will not be blocked by the synchronous standby failure. For example, if there are two synchronous standby servers, one fails and the other is normal, then the host transactions will only wait for the normal standby server and will not be blocked by the failed standby server. Another example is using the quorum protocol with one master and three synchronous standbys, configured as ANY 2(node1,node2,node3). If node1 and node3 fail while node2 is normal, the host transactions will not be blocked.
  • This parameter belongs to the SIGHUP type.
  • Value Range: Boolean
  • on means the host is not blocked when there is a synchronous standby failure.
  • off means the host is blocked when there is a synchronous standby failure.
  • Default Value: off

keep_sync_window

  • Parameter Description: The delay time to enter the maximum available mode.
  • When the most_available_sync is configured as on and in a master-slave scenario, if a synchronous standby failure causes the current configuration of the number of synchronous standby servers (see the meaning of synchronous_standby_name for details) to be unmet, if the keep_sync_window parameter is configured, the host will continue to maintain the maximum protection mode, blocking the host's transaction commit, and delaying the entry into the maximum available mode for the time set by the keep_sync_window.
  • If the synchronous standby server recovers within the keep_sync_window timeout window and meets the current configuration of the number of synchronous standby servers, the transaction will not be blocked, and the system will return to normal status.
  • If setting keep_sync_window, it is recommended to set a minimum of 5 seconds to avoid false positives for network instability by the monitoring system.
  • This parameter belongs to the SIGHUP type.
  • Value Range: Integer, range 0~INT_MAX, in seconds.
  • 0 means not setting a keep_sync_window timeout window, i.e., directly entering the maximum available mode.
  • Others mean the size of the keep_sync_window timeout window.
  • Default Value: 0
  • Notes:
  • Configuring this parameter may affect the RPO. If the host fails within the configured timeout window, data from the beginning of the block to the host failure time window may be lost.

enable_stream_replication

  • Parameter Description: Controls whether data and log synchronization is performed between master and standby, master and slave.
  • This parameter belongs to the SIGHUP type. Notes:
  • This parameter is for performance testing, used to test performance parameters with and without standby servers. When this parameter is closed, switching, fault, and other abnormal scenarios cannot be tested, otherwise, there will be inconsistencies between the master and standby servers.
  • This parameter is a controlled parameter and is not recommended to be turned off in normal business scenarios.
  • The current version does not support master-slave deployment mode by default.
  • Value Range: Boolean
  • on means enabling master-standby, master-slave synchronization.
  • off means disabling master-standby, master-slave synchronization.
  • Default Value: on

enable_mix_replication

  • Parameter Description: Controls the way WAL logs and data are replicated between master and standby, master and slave.
  • This parameter belongs to the INTERNAL type parameter, the default value is off, and external modifications are not allowed.
  • Notes:
  • This parameter is not allowed to be changed in normal business scenarios, i.e., closing the WAL log and data page mixed replication mode.
  • The current version does not support master-slave deployment mode by default.
  • Value Range: Boolean
  • on means enabling WAL log and data page mixed replication mode.
  • off means disabling WAL log and data page mixed replication mode.
  • Default Value: off

vacuum_defer_cleanup_age

  • Parameter Description: Specifies the number of transactions for VACUUM to use, and VACUUM will delay the cleanup of invalid row storage table records. The number of delayed transactions is set by vacuum_defer_cleanup_age. VACUUM and VACUUM FULL operations will not immediately clean up newly deleted tuples.
  • This parameter belongs to the SIGHUP type.
  • Value Range: Integer, 0~1000000, 0 means no delay.
  • Default Value: 0

data_replicate_buffer_size

  • Parameter Description: The size of the memory queue occupied when passing data pages between the sender and receiver. This parameter affects the replication buffer size between the master and standby servers.
  • This parameter belongs to the POSTMASTER type.
  • Value Range: Integer, 4096~1072693248, in KB.
  • Default Value: 16MB (i.e., 16384KB)

walsender_max_send_size

  • Parameter Description: Sets the size of the log or data sending buffer on the master server.
  • This parameter belongs to the POSTMASTER type.
  • Value Range: Integer, 8~INT_MAX, in KB.
  • Default Value: 8MB (i.e., 8192KB)

enable_data_replicate

  • Parameter Description: When importing data into a row storage table, the master and standby servers can choose the data synchronization method.
  • This parameter belongs to the USERSET type.
  • Value Range: Boolean
  • on means data synchronization between the master and standby servers uses data page mode when importing data into a row storage table. If the replication_type parameter is 1, it is not allowed to set it to on. If it is set to on using the guc tool, it will be forcibly changed to off.
  • off means data synchronization between the master and standby servers uses log (Xlog) mode when importing data into a row storage table.
  • Default Value: off

ha_module_debug

  • Parameter Description: Used to view the replication status log of specific data blocks during data replication.
  • This parameter belongs to the USERSET type.
  • Value Range: Boolean
  • on means logging the status of each data block during data replication.
  • off means not logging the status of each data block during data replication.
  • Default Value: off

enable_incremental_catchup

  • Parameter Description: Controls the method of catch-up between the primary and standby databases. The default setup currently does not support the primary-standby deployment mode.
  • This parameter is of type SIGHUP.
  • Value Range: Boolean
  • on: During standby catch-up, the incremental catch-up method is used. This means the standby database scans its local data files to obtain a list of differential data files between the primary and standby databases and performs the catch-up.
  • off: During standby catch-up, the full catch-up method is used. This means the standby database scans all data files on the primary database to obtain a list of differential data files between the primary and standby databases and performs the catch-up.
  • Default Value: on

sync_config_strategy

  • Parameter Description: The synchronization strategy for configuration files between the primary and standby databases and between the standby and cascading standby databases.
  • This parameter is of type POSTMASTER.
  • Value Range: Enum
  • all_node: When the primary is configured to all_node, it allows the primary to actively synchronize configuration files to all standby databases. When the standby is configured to all_node, it allows the current standby to send synchronization requests to its primary and actively synchronize configuration files to all its cascading standbys. When the cascading standby is configured to all_node, it allows the current cascading standby to send synchronization requests to the previous standby.
  • only_sync_node: When the primary is configured to only_sync_node, it allows the primary to actively synchronize configuration files only to all synchronous standby databases. When the standby is configured to only_sync_node, it allows the current standby to send synchronization requests to its primary but does not allow the current standby to actively synchronize configuration files to all its cascading standbys. When the cascading standby is configured to only_sync_node, it allows the current cascading standby to send synchronization requests to its standby.
  • none_node: When the primary is configured to none_node, it does not allow the primary to actively synchronize configuration files to any standby databases. When the standby is configured to none_node, it does not allow the current standby to send synchronization requests to its primary and does not allow the current standby to actively synchronize configuration files to all its cascading standbys. When the cascading standby is configured to none_node, it does not allow the current cascading standby to send synchronization requests to its standby.
  • Default Value: all_node
  • Notes:
  • In a GBase 8s cluster consisting of primary, standby, and cascading standby databases, the primary is the sender relative to the standby, the standby is the receiver relative to the primary, the standby is the sender relative to the cascading standby, and the cascading standby is the receiver relative to the standby.
  • The synchronization of configuration files by the sender to the receiver and the synchronization request by the receiver to the sender are two independent events. Both will trigger the synchronization of configuration files. If you do not want the configuration files to be synchronized, you need to configure the sync_config_strategy parameter of all nodes in the cluster to none_node. If you only want to synchronize configuration files between the primary and synchronous standby, configure the primary's sync_config_strategy parameter to only_sync_node and all other nodes to none_node. If you want all nodes to synchronize configuration files, configure the sync_config_strategy parameter of all nodes to all_node. Currently, custom synchronization strategies between arbitrary nodes are not supported.
  • The specific behavior of configuration parameter synchronization is that the sender sends configuration files to the receiver, directly overwriting the corresponding parameters in the receiver's configuration file. If the configuration file synchronization strategy is set, modifying the configuration parameters on the receiver side will immediately be overwritten by the sender's configuration parameters, making the receiver's modification ineffective.
  • Even if the configuration file synchronization strategy is set, some configuration parameters will not be synchronized. They include:
    • application_name
    • archive_command
    • audit_directory
    • available_zone
    • cgbase_control_port
    • cgbase_sctp_port
    • listen_addresses
    • log_directory
    • port
    • replconninfo1
    • replconninfo2
    • replconninfo3
    • replconninfo4
    • replconninfo5
    • replconninfo6
    • replconninfo7
    • replconninfo8
    • ssl
    • ssl_ca_file
    • ssl_cert_file
    • ssl_ciphers
    • ssl_crl_file
    • ssl_key_file
    • ssl_renegotiation_limit
    • ssl_cert_notify_time
    • synchronous_standby_names
    • local_bind_address
    • perf_directory
    • query_log_directory
    • asp_log_directory
    • streaming_router_port
    • enable_upsert_to_merge
    • archive_dest
    • recovery_min_apply_delay
    • sync_config_strategy

3. Standby Server Parameters

hot_standby

  • Parameter Description: Sets whether the standby server allows connections and queries during recovery. This parameter is of type POSTMASTER. Notes:
  • If this parameter is set to on, wal_level must be set to hot_standby, otherwise, the database will fail to start.
  • In a dual-machine environment, the hot_standby parameter cannot be set to off as it will affect other functions of the dual-machine setup.
  • If the hot_standby parameter was previously turned off and the wal_level parameter was set below the hot_standby level, before turning the hot_standby parameter back on, ensure that all logs on the standby server waiting for replay support standby queries by following these steps:
    1. Adjust the wal_level parameter on both the primary and standby to hot_standby or higher, and restart the instance to take effect.
    2. Execute a checkpoint operation on the primary server, and use the pg_stat_get_wal_senders() system function to confirm that each standby server's receiver_replay_location has caught up with the primary's current sender_flush_location, ensuring the wal_level adjustment is synchronized and effective on the standby, without needing to replay previous lower-level logs.
    3. Turn on the hot_standby parameter (set to on) on both the primary and standby, and restart the instance to take effect. Value Range: Boolean
  • on: Allows connections and queries on the standby server during recovery.
  • off: Disallows connections and queries on the standby server during recovery.
  • Default Value: on

max_standby_archive_delay

  • Parameter Description: When hot standby mode is enabled, this parameter sets the maximum time the standby server waits before canceling queries that conflict with the processing of archived WAL logs.
  • This parameter is of type SIGHUP.
  • Notes: -1 means the standby server will wait indefinitely for conflicting queries to complete.
  • Value Range: Integer, range: -1 to INT_MAX, in milliseconds.
  • Default Value: 3s (3000ms)

max_standby_streaming_delay

  • Parameter Description: When hot standby mode is enabled, this parameter sets the maximum time the standby server waits before canceling queries that conflict with the processing of streaming replication WAL logs.
  • This parameter is of type SIGHUP.
  • Notes: -1 means the standby server will wait indefinitely for conflicting queries to complete.
  • Value Range: Integer, range: -1 to INT_MAX, in milliseconds.
  • Default Value: 3s (3000ms)

wal_receiver_status_interval

  • Parameter Description: Sets the maximum time interval for the WAL receiver process to notify the primary server of its status.
  • This parameter is of type SIGHUP.
  • Value Range: Integer, range: 0 to INT_MAX, in milliseconds.
  • Default Value: 5s (5000ms) Notes:
  • Setting this parameter to 0 disables the standby server's feedback to the primary server about the log reception position, which may lead to primary server transaction commit blocking, switchover operation failures, and other anomalies. It is not recommended to set this parameter to 0 in normal business scenarios.

hot_standby_feedback

  • Parameter Description: Sets whether the standby server is allowed to feedback the results of queries executed on it to the primary server, which can avoid query conflicts.
  • This parameter is of type SIGHUP.
  • Value Range: Boolean
  • on: Allows the standby server to feedback the minimum transaction ID of executed queries to the primary server.
  • off: Disallows the standby server from feeding back the minimum transaction ID of executed queries to the primary server.
  • Default Value: off Notes:
  • When this parameter is on, the cleanup of old version data on the primary server will be restricted by the transactions being read on the standby server. The primary server will only clean up changes made by transactions with IDs smaller than those fed back by the standby server. Therefore, enabling this parameter may affect the primary server's performance.

wal_receiver_timeout

  • Parameter Description: Sets the maximum wait time for receiving data from the primary server.
  • This parameter is of type SIGHUP.
  • Value Range: Integer, 0 to INT_MAX, in milliseconds.
  • Default Value: 6s (6000ms)

wal_receiver_connect_timeout

  • Parameter Description: Sets the maximum timeout for connecting to the primary server.
  • This parameter is of type SIGHUP.
  • Value Range: Integer, 0 to INT_MAX / 1000, in seconds.
  • Default Value: 2s

wal_receiver_connect_retries

  • Parameter Description: Sets the maximum number of attempts to connect to the primary server.
  • This parameter is of type SIGHUP.
  • Value Range: Integer, 1 to INT_MAX.
  • Default Value: 1

wal_receiver_buffer_size

  • Parameter Description: The size of the memory buffer for storing Xlog received from the primary and standby servers. The default setup currently does not support the primary-standby-secondary deployment mode.
  • This parameter is of type POSTMASTER.
  • Value Range: Integer, 4096 to 1047552, in KB.
  • Default Value: 64MB (65536KB)

primary_slotname

  • Parameter Description: Sets the slot name of the primary server corresponding to the standby server, used for primary-standby verification and WAL log deletion mechanism.
  • This parameter is of type SIGHUP.
  • Value Range: String
  • Default Value: Empty string

max_logical_replication_workers

  • Parameter Description: The maximum number of apply worker threads for the subscription.
  • This parameter is of type POSTMASTER.
  • Value Range: Integer, 0 to 262143.
  • Default Value: 4

max_sync_workers_per_subscription

  • Parameter Description: The maximum number of tablesync worker threads per subscription.
  • This parameter is of type SIGHUP.
  • Value Range: Integer, 0 to 262143.
  • Default Value: 2
💖 💪 🙅 🚩
congcong
Cong Li

Posted on July 23, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related