Implementing Change Data Capute (CDC) with Aurora Serverless v2

omarrosadio

omarrosadio

Posted on April 22, 2022

Implementing Change Data Capute (CDC) with Aurora Serverless v2

Aurora Serverless v2 is now available after a long wait and it claim to solve many issues and limitations from its predecessor.
One of the limitations of Aurora Serverless v1 is the impossibility of using CDC (Change Data Capture) with AWS DMS (Database Migration Service) but now it is possible using the v2.
In this guide we will review step by step the configuration and implementation of CDC with Aurora Serverless v2 and AWS DMS.

Solution diagram

Actually, the configuration is pretty much the same as configuring CDC on a provisioned Aurora DB Cluster. Also consider that all the configurations are based on us-east-1 (N. Virginia) region.

Creating the Database
As first step, we need to create the Parameter Group, Subnet Group and the Database cluster. I am using almost all the default parameters, changing only the required ones to enable CDC (those parameters are in italic letters).

Parameter Group
Click on the 'Create Parameter Group' button:
Create Parameter Group

Select:
Parameter Group family: aurora-mysql8.0
(Currently it is the only one compatible with Aurora Serverless v2)

Type:
DB Cluster Parameter Group

Group Name:
pg-servelessv2-cdc

Description:
Enable change data capture

Configuring Parameter Group

Once the Parameter group is created, select it and click on 'Edit parameters':
Configuring Parameter Group

We need to change these parameters:
binlog_format: ROW
binlog_checksum: NONE

Configuring Parameter Group

Configuring Parameter Group

And save changes:
Parameter Group created

Subnet group
Using subnet group we can specify in which subnets the database will be deployed. So in this case I will select public subnets to be able to connect to the DB easily.

Click on the 'Create DB subnet group' button:
Creating subnet group

Name:
sg-db-publicaccess
Description:
Using public subnets for demo only
VPC:
your_VPC_id
Availability Zones:
az_ids (is mandatory to select at least 2)
Configuring subnet group
Configuring subnet group

And now the subnet groups is created:
Subnet group created

Database

Click on the 'Create Database' button:
Creating database

Database creation method: Standard create
Engine type: Amazon Aurora
Edition: Amazon Aurora MySQL-Compatible Edition
Replication features: Single-master (default)
Engine version: Aurora MySQL 3.02.0 (compatible with MySQL 8.0.23)
Templates: Dev/Test
DB cluster identifier: db-test-cdc
Master username: admin
Master password: supersecretpassword
DB instance class: Serverless v2 - new
Capacity range - Minimum ACUs: 0.5
Capacity range - Maximum ACUs: 1
Multi-AZ deployment: Don't create an Aurora Replica
Virtual private cloud (VPC): your_VPC_id
Subnet group: sg-db-publicaccess (the subnet group previously created)
Public access: Yes
VPC security group: Create new
New VPC security group name: rds-publicaccess
Availability Zone: No preference
Database port: 3306
Database authentication options: Password authentication
Initial database name: sampledb
DB cluster parameter group: pg-servelessv2-cdc (the parameter group previously created)
DB parameter group: default.aurora-mysql8.0

Creating database

Creating database

Creating database

Creating database

Creating database

Creating database

Creating database

Creating database

Click on Create database (it takes approximately 15 minutes to create completely):
Creating database

Also modifiy the created Security Group (rds-publicaccess) to allow inboud traffic on port 3306 from 0.0.0.0/0

Modifying Security Group

Creating the S3 bucket
The next step is to create an S3 bucket which will be used as the target destination for the full load and change data capture task.
It does not required any special configuration, we can use a bucket with default creation parameters:
Name: test-dms-s3-target

Creating S3 bucket

Creating S3 bucket

Creating S3 bucket

Creating the DMS Replication Instance

We need to create the DMS replication subnet group. Similar to database subnet group, it helps to specify the subnets that will be used. In this case I am selecting private subnets but can be any with proper connectivity to the database instance and S3 bucket.
Name: sg-dms

Creating subnet group

Additionally, create the Security group that will be attached to the DMS replication instance. It requires allow outbound traffic.
Creating DMS security group

And create the DMS Replication Instance:
Creating DMS Replication Instance

Name: dms-instance
Instance class: dms.t3.small
Engine version: 3.4.6
Allocated storage: 20GB
VPC: The same VPC as the one RDs instance belongs to
Multi AZ: Dev or test worload
Publicly accessible: No
Replication subnet group: sg-dms (the previously created)
Availability zone: No preference
VPC security group(s): dms-replication-instance (the previously created)
KMS key: Default

Creating DMS Replication Instance

Creating DMS Replication Instance

Inserting sample data into the database
Before create the DMS task I am going to insert sample data into the database to validate the task works as expected. The database imported is from MySQL sample, specifically, the Sakila sample database. These are 2 scripts to execute so it is very quick to replicate.
Connect to the database using the user and password configured and setting the Database endpoint properly (in case of connection errors check the SG allow traffic on port 3306 for the IP range):

Connecting to DB

Connecting to DB

Then execute the downloaded scripts: sakila-schema.sql and sakila-data.sql

Inserting sample data

And now we have populated the database:
Inserting sample data

Creating the DMS Tasks
Create the Source Endpoint to connect to the database and the Target Endpoint to connect to the S3 bucket:

Creating endpoint

Creating source endpoint

Creating source endpoint

Creating source endpoint

And test the connection:
Testing connection

Testing connection

Before create the Target Endpoint for the S3 bucket, we need to create an IAM role granting access to put/delete objects on the bucket:

Select DMS as the AWS service:
Creating IAM Role

Don't select any policies for now:
Creating IAM Role

Select an proper name and create the role:
Creating IAM Role

Creating IAM Role

And Add permissions -> Create inline policy
Creating IAM Role

The policy is as following (change the bucket name for the corresponding value):

{
    "Version" : "2012-10-17",
    "Statement" : [ 
      {
        "Effect" : "Allow",
        "Action" : [
          "s3:PutObject",
          "s3:DeleteObject"
        ],
        "Resource" : "arn:aws:s3:::test-dms-s3-target/*"
      },
      {
        "Effect" : "Allow",
        "Action" : "s3:ListBucket",
        "Resource" : "arn:aws:s3:::test-dms-s3-target"
      } 
    ]
  }
Enter fullscreen mode Exit fullscreen mode

Creating IAM Role

Creating IAM Role

Now, we can create the Target endpoint:
Creating Target Endpoint

Creating Target Endpoint

Creating Target Endpoint

And test the connection:
Testing Target Endpoint

To create the Database Migration Task we use the resources previously created:

Task identifier: full-load-and-cdc-auroraserverless

Creating dms task

For Task setting in this demo we can leave the default parameters except for the Enable CloudWatch logs which will be helpful to validate everything is Ok:

Creating dms task

Creating dms task

For Table mappings we only need to specify the schema (sakila in this case) and leave the other parameters with default values:

Creating dms task

Creating dms task

And finally click on Create task:
Creating dms task

Validating the CDC feature
Lets keep the task running for some minutes untill the state is "Load complete, replication ongoing":
Validating task

Check from table statistics that the data was loaded successfully:
Validating task

The relevant fields are Full load rows and Total rows:
Validating task

If you navigate through the bucket objects, you will notice there is a folder for each table:
Validating task

And for the first load is structured as follows:
Validating task

To check the CDC in action, lets insert some rows:

INSERT INTO `sakila`.`customer` (`customer_id`, `store_id`, `first_name`, `last_name`, `email`, `address_id`, `active`, `create_date`, `last_update`) VALUES ('600', '2', 'JOHN', 'SMITH', 'JOHN.SMITH@sakilacustomer.org', '605', '1', '2006-02-14 22:04:37', '2006-02-15 04:57:20');
INSERT INTO `sakila`.`customer` (`customer_id`, `store_id`, `first_name`, `last_name`, `email`, `address_id`, `active`, `create_date`, `last_update`) VALUES ('601', '2', 'WILL', 'CARTER', 'WILL.CARTER@sakilacustomer.org', '605', '1', '2006-02-14 22:04:37', '2006-02-15 04:57:20');
INSERT INTO `sakila`.`customer` (`customer_id`, `store_id`, `first_name`, `last_name`, `email`, `address_id`, `active`, `create_date`, `last_update`) VALUES ('602', '2', 'DONALD', 'JACKSON', 'DONALD.JACKSON@sakilacustomer.org', '605', '1', '2006-02-14 22:04:37', '2006-02-15 04:57:20');
Enter fullscreen mode Exit fullscreen mode

And also delete some rows from the customer table:

DELETE FROM `sakila`.`customer` WHERE (`customer_id` = '9');
DELETE FROM `sakila`.`customer` WHERE (`customer_id` = '15');
DELETE FROM `sakila`.`customer` WHERE (`customer_id` = '4');
DELETE FROM `sakila`.`customer` WHERE (`customer_id` = '433');
DELETE FROM `sakila`.`customer` WHERE (`customer_id` = '599');
Enter fullscreen mode Exit fullscreen mode

After a couple of minutes (it could take 2 or 3 minutes), the changes should be reflected on both the task statistics and S3 bucket:
Validating task
Validating task

Validating task

Conclusion
With the new version of Aurora serverless there is not limitation on setting up DMS migration tasks with Change Data Capture. It allows to take advantage of the serverless model and also the configuration is very similar to provisioned version.

💖 💪 🙅 🚩
omarrosadio
omarrosadio

Posted on April 22, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related