Synchronizing Raw Data in GBase 8s Databases via ER

Introduction

Once you've defined and set up the replication server, executing cdr start replicate will synchronize the initial data from the source database to the target server. This method is straightforward but has the drawback of being relatively slow, making it suitable for environments with small data volumes. The relevant parameter for this operation is CDR_QUEUEMEM.

Synchronization Process

The synchronization process involves adding rows from the source server that do not exist on the target server and modifying rows that exist on both servers but are inconsistent. The strategy for handling rows that exist on the target server but not on the source server is as follows:

Options and Descriptions

delete: Deletes rows and their dependent rows from the target server based on referential integrity constraints.
keep: Retains the rows on the target server.
merge: Keeps the rows on the target server and replicates them back to the source server.

Testing Environment

In the testing environment, a database named testdb is created simultaneously on two instances. Tables t1 through t5 are also created, each with primary key constraints, and initial data is inserted into the tables as follows:

table	group1	group2
t1	> select * from t1; a b 1 1 2 2 4 4 5 5	> select * from t1; a b 1 1 2 1 3 3 5 5
t2	> select * from t2; a b 1 1 5 5 2 2 4 4	> select * from t2; a b 1 1 2 1 3 3 5 5
t3	> select * from t3; a b 1 1 2 2 4 4 5 5	> select * from t3; a b 1 1 2 1 3 3 5 5
t4	> select * from t4; a b 1 1 2 2 4 4 5 5 6 6	> select * from t4; a b 1 1 2 1 3 3 5 5 6 6
t5	> select * from t5; a b 1 1 2 2 4 4 5 5	> select * from t5; a b 1 1 2 1 3 3 5 5

Define Replicate

cdr define replicate --conflict=ignore   rep_testdb_t1 "testdb@group1:gbasedbt.t1" "select * from t1" "testdb@group2:gbasedbt.t1"  "select * from t1"
cdr define replicate --conflict=ignore   rep_testdb_t2 "testdb@group1:gbasedbt.t2" "select * from t2" "testdb@group2:gbasedbt.t2"  "select * from t2"
cdr define replicate --conflict=ignore   rep_testdb_t3 "testdb@group1:gbasedbt.t3" "select * from t3" "testdb@group2:gbasedbt.t3"  "select * from t3"
cdr define replicate --conflict=ignore   rep_testdb_t4 "testdb@group1:gbasedbt.t4" "select * from t4" "testdb@group2:gbasedbt.t4"  "select * from t4"
cdr define replicate --conflict=ignore  rep_testdb_t5 "testdb@group1:gbasedbt.t5" "select * from t5" "testdb@group2:gbasedbt.t5"   "select * from t5"

Now, the replication status is Inactive

[gbasedbt@gbase42 ~]$ cdr list repl rep_testdb_t1 rep_testdb_t2 rep_testdb_t3 rep_testdb_t4 rep_testdb_t5

DEFINED REPLICATES ATTRIBUTES
------------------------------
REPLICATE:        rep_testdb_t1
STATE:            Inactive ON:group2
CONFLICT:         Ignore
FREQUENCY:        immediate
QUEUE SIZE:       0
PARTICIPANT:      testdb:gbasedbt.t1
OPTIONS:          transaction,fullrow
REPLID:           131079 / 0x20007
REPLMODE:         PRIMARY  ON:group2
APPLY-AS:         GBASEDBT ON:group2
REPLTYPE:         Master

REPLICATE:        rep_testdb_t2
STATE:            Inactive ON:group2
CONFLICT:         Ignore
FREQUENCY:        immediate
QUEUE SIZE:       0
PARTICIPANT:      testdb:gbasedbt.t2
OPTIONS:          transaction,fullrow
REPLID:           131080 / 0x20008
REPLMODE:         PRIMARY  ON:group2
APPLY-AS:         GBASEDBT ON:group2
REPLTYPE:         Master

REPLICATE:        rep_testdb_t3
STATE:            Inactive ON:group2
CONFLICT:         Ignore
FREQUENCY:        immediate
QUEUE SIZE:       0
PARTICIPANT:      testdb:gbasedbt.t3
OPTIONS:          transaction,fullrow
REPLID:           131081 / 0x20009
REPLMODE:         PRIMARY  ON:group2
APPLY-AS:         GBASEDBT ON:group2
REPLTYPE:         Master

REPLICATE:        rep_testdb_t4
STATE:            Inactive ON:group2
CONFLICT:         Ignore
FREQUENCY:        immediate
QUEUE SIZE:       0
PARTICIPANT:      testdb:gbasedbt.t4
OPTIONS:          transaction,fullrow
REPLID:           131082 / 0x2000a
REPLMODE:         PRIMARY  ON:group2
APPLY-AS:         GBASEDBT ON:group2
REPLTYPE:         Master

REPLICATE:        rep_testdb_t5
STATE:            Inactive ON:group1
CONFLICT:         Ignore
FREQUENCY:        immediate
QUEUE SIZE:       0
PARTICIPANT:      testdb:gbasedbt.t5
OPTIONS:          transaction,fullrow
REPLID:           131083 / 0x2000b
REPLMODE:         PRIMARY  ON:group1
APPLY-AS:         GBASEDBT ON:group1
REPLTYPE:         Master

Start Replicate

cdr start repl rep_testdb_t1 
cdr start repl rep_testdb_t2 --syncdatasource=group1 --extratargetrows=delete
cdr start repl rep_testdb_t3 --syncdatasource=group1 --extratargetrows=keep
cdr start repl rep_testdb_t4 --syncdatasource=group1 --extratargetrows=merge
cdr start repl rep_testdb_t5 --syncdatasource=group1

Raw Data Comparison:

table	Before Replication Group1	Before Replication Group2	After Replication Group1	After Replication Group2
t1	> select * from t1; a b 1 1 2 2 4 4 5 5	> select * from t1; a b 1 1 2 1 3 3 5 5	> select * from t1; a b 1 1 2 2 4 4 5 5	> select * from t1; a b 1 1 2 1 3 3 5 5
t2	> select * from t2; a b 1 1 5 5 2 2 4 4	> select * from t2; a b 1 1 2 1 3 3 5 5	> select * from t2; a b 1 1 5 5 2 2 4 4	> select * from t2; a b 1 1 2 2 5 5 4 4
t3	> select * from t3; a b 1 1 2 2 4 4 5 5	> select * from t3; a b 1 1 2 1 3 3 5 5	> select * from t3; a b 1 1 2 2 4 4 5 5	> select * from t3; a b 1 1 2 2 3 3 5 5 4 4
t4	> select * from t4; a b 1 1 2 2 4 4 5 5 6 6	> select * from t4; a b 1 1 2 1 3 3 5 5 6 6	> select * from t4; a b 1 1 2 2 4 4 5 5 6 6 3 3	> select * from t4; a b 1 1 2 2 3 3 5 5 6 6 4 4
t5	> select * from t5; a b 1 1 2 2 4 4 5 5	> select * from t5; a b 1 1 2 1 3 3 5 5	> select * from t5; a b 1 1 2 2 4 4 5 5	> select * from t5; a b 1 1 2 2 5 5 4 4

Conclusion:

Without specifying the data source with the -S parameter, initial synchronization is not enabled, and data present before the replication start is not processed.
When using the -S parameter to specify the data source and deleting extra rows on the target side (based on the primary key), the target data will be consistent with the source data.
When using the -S parameter to specify the data source and retaining extra rows on the target side, the target data will be the source data plus the extra rows on the target side.
When using the -S parameter to specify the data source and merging extra rows on the target side, the target and source data will be consistent, both containing the source data plus the extra rows on the target side.
When using the -S parameter to specify the data source, the default behavior for extra rows on the target side is deletion.

The replication technology of GBase 8s database not only improves data availability but also meets diverse business needs through flexible conflict handling strategies. Mastering these technologies will help in building more robust and flexible data architectures.

Blog