Bill "The Vest Guy" Penberthy
Posted on August 16, 2022
The AWS Database Migration Service (AWS DMS) was designed to help quickly and securely migrate databases into AWS. The premise is that the source database remains available during the migration to help minimize application downtown. AWS DMS supports homogeneous migrations such as SQL Server to SQL Server or Oracle to Oracle as well as some heterogeneous migrations between different platforms. You can also use the service to continuously replicate data from any supported source to any supported target, meaning you can use DMS for both one-time replications as well as ongoing replications. AWS DMS works with relational databases and NoSQL databases as well as other types of data stores. One thing to note, however, is that at least one end of your migration must be on an AWS service, you cannot use AWS DMS to migrate between two on-premises databases.
How Does it Work?
You can best think of DMS as replication software running on a server in the cloud. There are literally dozens of these kinds of tools, some cloud-based, some that you install locally to move data between on-premise systems. The DMS’ claim to fame is that you only pay for the work that you have it perform – there is no licensing fee for the service itself like with most of the other software solutions.
Figure 1 shows DMS at a high level. The green box in Figure 1 is the overall service and contains three major subcomponents. Two of these are endpoints used to connect to the source and target databases, and the third is the replication instance.
Figure 1. A high-level look at AWS Data Migration Service
The replication instance is an Amazon EC2 instance that provides the resources necessary to carry out the database migration. Since it is a replication instance, you can get high availability and failover support if you select to use a multi-region-based process.
AWS DMS uses this replication instance to connect to your source database through the source endpoint. The instance then reads the source data and performs any data formatting necessary to make it compatible with the target database. The instance then loads that data into the target database. Much of this processing is done in memory, however large data sets may need to be buffered onto disk as part of the transfer. Logs and other replication-specific data are also written onto the replication instance.
Creating a Replication Instance
Enough about the way that it is put together, let’s jump directly into creating a migration service, and we will go over the various options as they come up in the process.
Note: Not all EC2 instance classes are available for use as a replication instance. As of the time of this writing, only T3 (general purpose), C5 (compute-optimized), and R5 (memory-optimized) Amazon EC2 instance classes can be used. You can use a t3.micro instance under the AWS Free Tier, however, there is a chance that you may be charged if the utilization of the instance over a rolling 24-hour period exceeds the baseline utilization. This will not be a problem in our example, but it may be with other approaches, especially if you use ongoing replication.
You can get to the AWS DMS console by searching for “DMS” or by going into the Migration & Transfer service group and selecting it there. Click the Create replication instance button once you get to the console landing page. This will take you to the creation page. Remember as you go through this that all we are doing here is creating the EC2 instance that DMS will use for processing, so all the questions will be around that.
The fields that you can enter in the Replication instance configuration section are:
- Name – must be unique across all replication instances in the current region
- Descriptive Amazon Resource Name (ARN) – This field is optional, but it allows you to use a friendly name for the ARN rather than the typical set of nonsense that AWS creates by default. This value cannot be changed after creation.
- Description – Short description of the instance
- Instance class – This is where you select the instance class on which your migration process will be running.
- Engine version – This option allows the targeting of previous versions of DMS, or the software that runs within the instance class – though we have no idea why you would ever target an older version.
- Allocated storage – The amount of storage space that you want in your instance. This is where items like log files will be stored and will also be used for disc caching if the instance’s memory is not sufficient to handle all of the processing.
- VPC – Where the instance should be run.
- Multi AZ – You can choose between Production workload which will set up multi-AZ or Dev or test workload which will create the instance in a single AZ.
- Publicly accessible – This is necessary if you are looking to connect to databases outside of your VPC, or even outside of AWS.
There are three additional sections that you can configure. The first of these is Advanced security and network configuration where you can define the specific subnet group for your replication instance, the availability zone in which your replication instance should run, and VPC security groups that you want to be assigned to your replication instance, and the AWS Key Management Service key that you would like used.
The next section is Maintenance, where you can define the weekly maintenance window that AWS will use for maintaining the DMS engine software and operating system. You must have this configured, and AWS will set up a default window for you. The last section that you can configure is, of course, Tags.
Once you click the Create button you will see that your replication instance is being created as shown in Figure 2. This creation process will take several minutes.
Figure 2. Creating a DMS replication instance
Now that you have a replication instance, the next step is to create your endpoints.
Creating your Source and Target Endpoints
As briefly mentioned above, the endpoints manage the connection to your source and target databases. They are managed independently from the replication instance because there are many cases where there are multiple replications that talk to a single source or target, such as copying one set of data to one target and another set of data from the same source to a second target such as shown in Figure 3.
Figure 3. Multiple replications against a single source endpoint
To create an endpoint, go into Endpoints and select Create endpoint. This will bring up the Create endpoint screen. Your first option is to define the Endpoint type, as shown in Figure 4.
Figure 4. Endpoint type options when creating a DMS endpoint
Your first option when creating the endpoint is to determine whether the endpoint is going to be a source or target endpoint. You would think that this wouldn’t really matter because a database connection is a database connection whether you are reading or writing, but DMS has made decisions around which databases they will support reading from and which databases you can write to, and, as you can likely predict, they are not the same list. Table 1 lists the different databases supported for each endpoint type, as of the time of this writing.
Database | As Source | As Target |
---|---|---|
Oracle v10.2 and later | X | X |
SQL Server 2005 and later | X | X |
MySQL 5.5 and later | X | X |
MariaDB 10.0.24 and later | X | X |
PostgreSQL 9.4 and later | X | X |
SAP Adaptive Server Enterprise (ASE) 12.5 and above | X | X |
IBM DB2 multiple versions | X | |
Redis 6.x | X | |
Azure SQL Database | X | |
Google Cloud for MySQL | X | |
All RDS instance databases | X | |
Amazon S3 | X | |
Amazon DocumentDB | X | |
Amazon OpenSearch Service | X | |
Amazon ElastiCache for Redis | X | |
Amazon Kinesis Data Streams | X | |
Amazon DynamoDB | X | |
Amazon Neptune | X | |
Apache Kafka | X |
Table 1. Databases available as sources and targets
The next option in the Endpoint type section is a checkbox to Select RDS DB instance. Checking this box will bring up a dropdown containing a list of RDS instances as shown in Figure 5.
Figure 5. Selecting an RDS database when creating an endpoint
The next section is the Endpoint configuration. There are two primary sections to this section, the first section allows you to name the endpoint and select the type of database to which you are connecting and the second is Endpoint settings where you can define those additional settings needed to access a specific database. Selecting the Source\Target engine will expand the form, adding some additional fields.
The first of these fields is Access to endpoint database. There are two options available and the choice you make will change the rest of the form. These two options are AWS Secrets Manager, where you use stored secrets for the login credentials, or Provide access information manually where you manually configure the database connection.
Selecting to use AWS Secrets Manager will bring up additional fields as described below. These fields are used to fetch and access the appropriate secret.
- Secret ID – the actual secret to be used when logging into the database
- IAM role – the IAM role that grants Amazon DMS the appropriate permissions to use the necessary secret
- Secure Socket Layer (SSL) mode – whether to use SSL when connecting to the database.
Selecting to Provide access information manually brings up the various fields necessary to connect to that identified engine. Figure 6 shows what this looks like when connecting to a SQL Server, and hopefully, all these values look familiar because we have used them multiple times in earlier articles.
Figure 6. Providing SQL Server information manually for an endpoint
The next section is the Endpoint settings section. The purpose of this section is to add any additional settings that may be necessary for this particular instance of the database to which it is connecting. There are two ways in which you can provide this information. The first is through a Wizard, while the second is through an Editor. When using the Wizard approach, clicking the Add new setting button will bring up a Setting \ Value row, with the Setting being a drop-down list of known settings as shown in Figure 7. These values will be different for each engine as well as whether you are using the endpoint as a source or a target.
Figure 7. Endpoint settings section when creating a SQL Server endpoint
Selecting to use the Editor approach will bring up a large text box where you can enter the endpoint settings in JSON format. This would likely be the best approach if you need to configure multiple DMS endpoints with the same additional settings.
Once you have Endpoint configuration section complete, the next section is KMS key where you select the appropriate key to be used when encrypting the data that you have input into the configuration. The next section is Tags. The last section entitled Test endpoint connection (optional) is shown in Figure 8 and is where you can test all the information that you have just filled out.
Figure 8. Testing an endpoint configuration
There are two values that you must identify before you can run the test, and that is the VPC and replication instance that you want to use, which is why we had you create the replication instance first! These are necessary because these are the resources that will be used to perform the work of connecting to the database. Once the values are selected, click the Run test button. After a surprising amount of time where you see indications that the test is running, you should get confirmation that your test was successful. This output is shown in Figure 9.
Figure 9. Successful test on an endpoint configuration
Obviously, you will need to configure at least one source endpoint and one target endpoint before you can run DMS end to end. However, you also need to make sure that you have each of them configured before you can configure the database migration task. We’ll finish that up in the next article!
Posted on August 16, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.