Idempotently Create/Update AWS RDS Instances in Python

li_chastina

Chastina Li 👩🏻‍💻

Posted on October 24, 2018

Idempotently Create/Update AWS RDS Instances in Python

There are several ways to manage databases via code, for Amazon Web Service's Relational Database Service(RDS), you can use Terraform, Cloudformation Templates, or API libraries in your favorite language. While Terraform's statefiles are a hassle to managed and sometimes it will delete your database and start from scratch. And CloudFormation templates are too verbose to be managed properly. In this blog post, I'm talking specifically about the Python AWS API library: boto3.

Creating a Postgres database

Firstly, we need to set up the boto client:

import boto3
from botocore.config import Config

boto_config = Config(retries=dict(max_attempts=20))
client = boto3.client(
    'rds', region_name='us-east-1', config=boto_config
)

then let's invoke the create_db_instance() method with a set of configurations:

db_vars = {
    "DBName": "db_name",
    "DBInstanceIdentifier": "instance_name",
    "AllocatedStorage": 20,
    "DBInstanceClass": "db.m3.medium",
    "Engine": "postgres",
    "MasterUsername": "username",
    "MasterUserPassword": "password",
    "VpcSecurityGroupIds": [
        "sg-0007c6489efbd9bca",
    ],
    "DBSubnetGroupName": "my-subnet",
    "DBParameterGroupName": "my-parameter-group",
    "BackupRetentionPeriod": 7,
    "MultiAZ": True,
    "EngineVersion": "10.0.1",
    "PubliclyAccessible": False,
    "StorageType": "gp2",
}
client.create_db_instance(**db_vars)

Updating existing database

Updating database is a bit more complicated because it involves handling idempotency. In computing, an idempotent operation is one that has no additional effect if it is called more than once with the same input parameters. Why is it important? We don't want to update the same set of configurations on the database again and again, this causes confusion from boto and will result in errors.

Part of handling idempotency is reconciliation with existing state. This usually involve computation of differences between desired state and current state. For boto, this means whatever we give modify_db_instance must represent this difference. In Python, this can simply be done with a Dictionary comprehension:

updatable_boto_params = (
    "AllocatedStorage",
    "DBInstanceClass",
    "DBParameterGroupName",
    "DBSubnetGroupName",
    "EngineVersion",
    "Iops",
    "StorageType",
)

updated_params = {
    param: new_params.get(param)
    for param in updatable_boto_params
    if not current_params.get(param) == new_params.get(param)
}

where updatable_boto_params is parameters that you allow to update, you can obviously lock this list down to a very small set of parameters for compliance or security purposes. See the complete list of updatable configurations here.

Now that we have the change set, we can invoke the modify_db_instance:

client.modify_db_instance(**updated_params)

Waiting for it to finish

Database operations are usually long-running, when performing a series of steps sequentially (ie. create database and create a read replica), you need to wait for the synchronous process to finish. One way to tell if database has finished updating itself is by looking at the DBInstanceStatus returned by the 'describe_db_instances' method. Status of available means database has finished whatever it's doing and you can proceed to the next operation.

A better way is to use boto's RDS waiter. It polls describe_db_instances() every 30 seconds until a successful state is reached by default. But you can customize how frequently you want it to pool database status, and how long before you give up and declare the database not ready.

We can also use Python's context manager as a nice little wrapper of the waiter, we can also throw whatever exception handling logic into the context manager just to clean up the code:

from contextlib import contextmanager
from botocore.exceptions import WaiterError

@contextmanager
def wait_for_availability(instance_id, delay=120, retries=60):
    " Yield back control after database is available "
    yield True
    try:
        waiter = self.db_client.get_waiter("db_instance_available")
        waiter.wait(
            DBInstanceIdentifier=instance_id,
            WaiterConfig={"Delay": delay, "MaxAttempts": retries},
        )
    except WaiterError as exception:
        LOG.error("DBInstance: %s is not available.", instance_id)
        raise exception
    else:
        LOG.info("DB instance: %s is now available.", instance_id)
    break


with wait_for_availability(instance_id):
    client.modify_db_instance(**updated_params)

In the code snippet above, we modify our database and pool for its status every 2 minutes and for a total of 60 times. After that the database is either available and we exit, or still unavailable and we raise an exception.

Database operations are usually time consuming, and guarantee idempotence is a major complexity to deal with. Fortunately, boto provides a bunch of very useful APIs to satisfy our needs.

💖 💪 🙅 🚩
li_chastina
Chastina Li 👩🏻‍💻

Posted on October 24, 2018

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related