Synchronizing Raw Data in GBase 8s Databases via ER
Cong Li
Posted on July 1, 2024
Introduction
Once you've defined and set up the replication server, executing cdr start replicate
will synchronize the initial data from the source database to the target server. This method is straightforward but has the drawback of being relatively slow, making it suitable for environments with small data volumes. The relevant parameter for this operation is CDR_QUEUEMEM
.
Synchronization Process
The synchronization process involves adding rows from the source server that do not exist on the target server and modifying rows that exist on both servers but are inconsistent. The strategy for handling rows that exist on the target server but not on the source server is as follows:
Options and Descriptions
- delete: Deletes rows and their dependent rows from the target server based on referential integrity constraints.
- keep: Retains the rows on the target server.
- merge: Keeps the rows on the target server and replicates them back to the source server.
Testing Environment
In the testing environment, a database named testdb
is created simultaneously on two instances. Tables t1
through t5
are also created, each with primary key constraints, and initial data is inserted into the tables as follows:
table | group1 | group2 |
---|---|---|
t1 | > select * from t1; a b 1 1 2 2 4 4 5 5 |
> select * from t1; a b 1 1 2 1 3 3 5 5 |
t2 | > select * from t2; a b 1 1 5 5 2 2 4 4 |
> select * from t2; a b 1 1 2 1 3 3 5 5 |
t3 | > select * from t3; a b 1 1 2 2 4 4 5 5 |
> select * from t3; a b 1 1 2 1 3 3 5 5 |
t4 | > select * from t4; a b 1 1 2 2 4 4 5 5 6 6 |
> select * from t4; a b 1 1 2 1 3 3 5 5 6 6 |
t5 | > select * from t5; a b 1 1 2 2 4 4 5 5 |
> select * from t5; a b 1 1 2 1 3 3 5 5 |
Define Replicate
cdr define replicate --conflict=ignore rep_testdb_t1 "testdb@group1:gbasedbt.t1" "select * from t1" "testdb@group2:gbasedbt.t1" "select * from t1"
cdr define replicate --conflict=ignore rep_testdb_t2 "testdb@group1:gbasedbt.t2" "select * from t2" "testdb@group2:gbasedbt.t2" "select * from t2"
cdr define replicate --conflict=ignore rep_testdb_t3 "testdb@group1:gbasedbt.t3" "select * from t3" "testdb@group2:gbasedbt.t3" "select * from t3"
cdr define replicate --conflict=ignore rep_testdb_t4 "testdb@group1:gbasedbt.t4" "select * from t4" "testdb@group2:gbasedbt.t4" "select * from t4"
cdr define replicate --conflict=ignore rep_testdb_t5 "testdb@group1:gbasedbt.t5" "select * from t5" "testdb@group2:gbasedbt.t5" "select * from t5"
Now, the replication status is Inactive
[gbasedbt@gbase42 ~]$ cdr list repl rep_testdb_t1 rep_testdb_t2 rep_testdb_t3 rep_testdb_t4 rep_testdb_t5
DEFINED REPLICATES ATTRIBUTES
------------------------------
REPLICATE: rep_testdb_t1
STATE: Inactive ON:group2
CONFLICT: Ignore
FREQUENCY: immediate
QUEUE SIZE: 0
PARTICIPANT: testdb:gbasedbt.t1
OPTIONS: transaction,fullrow
REPLID: 131079 / 0x20007
REPLMODE: PRIMARY ON:group2
APPLY-AS: GBASEDBT ON:group2
REPLTYPE: Master
REPLICATE: rep_testdb_t2
STATE: Inactive ON:group2
CONFLICT: Ignore
FREQUENCY: immediate
QUEUE SIZE: 0
PARTICIPANT: testdb:gbasedbt.t2
OPTIONS: transaction,fullrow
REPLID: 131080 / 0x20008
REPLMODE: PRIMARY ON:group2
APPLY-AS: GBASEDBT ON:group2
REPLTYPE: Master
REPLICATE: rep_testdb_t3
STATE: Inactive ON:group2
CONFLICT: Ignore
FREQUENCY: immediate
QUEUE SIZE: 0
PARTICIPANT: testdb:gbasedbt.t3
OPTIONS: transaction,fullrow
REPLID: 131081 / 0x20009
REPLMODE: PRIMARY ON:group2
APPLY-AS: GBASEDBT ON:group2
REPLTYPE: Master
REPLICATE: rep_testdb_t4
STATE: Inactive ON:group2
CONFLICT: Ignore
FREQUENCY: immediate
QUEUE SIZE: 0
PARTICIPANT: testdb:gbasedbt.t4
OPTIONS: transaction,fullrow
REPLID: 131082 / 0x2000a
REPLMODE: PRIMARY ON:group2
APPLY-AS: GBASEDBT ON:group2
REPLTYPE: Master
REPLICATE: rep_testdb_t5
STATE: Inactive ON:group1
CONFLICT: Ignore
FREQUENCY: immediate
QUEUE SIZE: 0
PARTICIPANT: testdb:gbasedbt.t5
OPTIONS: transaction,fullrow
REPLID: 131083 / 0x2000b
REPLMODE: PRIMARY ON:group1
APPLY-AS: GBASEDBT ON:group1
REPLTYPE: Master
Start Replicate
cdr start repl rep_testdb_t1
cdr start repl rep_testdb_t2 --syncdatasource=group1 --extratargetrows=delete
cdr start repl rep_testdb_t3 --syncdatasource=group1 --extratargetrows=keep
cdr start repl rep_testdb_t4 --syncdatasource=group1 --extratargetrows=merge
cdr start repl rep_testdb_t5 --syncdatasource=group1
Raw Data Comparison:
table | Before Replication Group1 |
Before Replication Group2 |
After Replication Group1 |
After Replication Group2 |
---|---|---|---|---|
t1 | > select * from t1; a b 1 1 2 2 4 4 5 5 |
> select * from t1; a b 1 1 2 1 3 3 5 5 |
> select * from t1; a b 1 1 2 2 4 4 5 5 |
> select * from t1; a b 1 1 2 1 3 3 5 5 |
t2 | > select * from t2; a b 1 1 5 5 2 2 4 4 |
> select * from t2; a b 1 1 2 1 3 3 5 5 |
> select * from t2; a b 1 1 5 5 2 2 4 4 |
> select * from t2; a b 1 1 2 2 5 5 4 4 |
t3 | > select * from t3; a b 1 1 2 2 4 4 5 5 |
> select * from t3; a b 1 1 2 1 3 3 5 5 |
> select * from t3; a b 1 1 2 2 4 4 5 5 |
> select * from t3; a b 1 1 2 2 3 3 5 5 4 4 |
t4 | > select * from t4; a b 1 1 2 2 4 4 5 5 6 6 |
> select * from t4; a b 1 1 2 1 3 3 5 5 6 6 |
> select * from t4; a b 1 1 2 2 4 4 5 5 6 6 3 3 |
> select * from t4; a b 1 1 2 2 3 3 5 5 6 6 4 4 |
t5 | > select * from t5; a b 1 1 2 2 4 4 5 5 |
> select * from t5; a b 1 1 2 1 3 3 5 5 |
> select * from t5; a b 1 1 2 2 4 4 5 5 |
> select * from t5; a b 1 1 2 2 5 5 4 4 |
Conclusion:
- Without specifying the data source with the
-S
parameter, initial synchronization is not enabled, and data present before the replication start is not processed. - When using the
-S
parameter to specify the data source and deleting extra rows on the target side (based on the primary key), the target data will be consistent with the source data. - When using the
-S
parameter to specify the data source and retaining extra rows on the target side, the target data will be the source data plus the extra rows on the target side. - When using the
-S
parameter to specify the data source and merging extra rows on the target side, the target and source data will be consistent, both containing the source data plus the extra rows on the target side. - When using the
-S
parameter to specify the data source, the default behavior for extra rows on the target side is deletion.
The replication technology of GBase 8s database not only improves data availability but also meets diverse business needs through flexible conflict handling strategies. Mastering these technologies will help in building more robust and flexible data architectures.
Posted on July 1, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 29, 2024