Exploring AWS CDK - Loading DynamoDB with Custom Resources
Matt Morgan
Posted on January 6, 2020
One of the reasons AWS CDK has me so intrigued is the promise of being able to spin up environments in minutes. If I can provision all my infrastructure, databases and applications with a single structure that is source controlled, I can do all kinds of things most engineering teams have only dreamed of:
- Run N test environments to avoid logjams/branch conflicts.
- Team or individual developer sandboxes spun up (and down) in minutes.
- Isolated environments for CI/CD and test automation strategies.
- Staging/demo/eval/load test environments on demand and discarded after use.
- Customer isolation into separate accounts or VPCs.
Managing data can be somewhat tricky when it comes to trying to pull off something like this so I really wanted to find out if I could use CDK to load the database I've just provisioned. A fresh developer account with all infrastructure and apps provisioned but NO DATA AT ALL is probably not going to deliver the smooth experience I'm striving for. So how can CDK help me meet this goal?
Table of Contents
- CDK and Tools Review
- tl;dr
- DynamoDB
- Create a Table
- AWS Custom Resource
- Fake Friends via Faker
- Call the API
- Make it Go Faster!
- And Faster!
- Unlimited Data!
- Next Steps
CDK and Tools Review
I explained my thoughts on how to set up CDK projects in my last article. If you want to know why I've changed some of the project setup or my ideas about how linting should be done, it's all there.
tl;dr
Skip the article and check out the code, if you prefer.
DynamoDB
DynamoDB is the managed nosql solution from AWS. I'm not going to do a deep dive into DynamoDB here. I chose DynamoDB for this example because it's serverless and fully managed. That'll make it cheap to play around with and fast to provision. I haven't done it yet, but I'm confident we could apply similar techniques to RDS.
Create a Table
There's no need to create schemas or define columns with DynamoDB. I only need to create a Table and specify its PartitionKey attribute.
Naturally this is simple to do in CDK.
import { AttributeType, Table } from '@aws-cdk/aws-dynamodb';
import { Construct, RemovalPolicy, Stack, StackProps } from '@aws-cdk/core';
export class CdkDynamoCustomLoaderStack extends Stack {
constructor(scope: Construct, id: string, props?: StackProps) {
super(scope, id, props);
const tableName = 'friends';
new Table(this, 'FriendsTable', {
tableName,
partitionKey: { name: 'id', type: AttributeType.STRING },
removalPolicy: RemovalPolicy.DESTROY,
});
}
}
I'm creating a table called friends
. Since the life of a developer is lonely, I will use an AWS Custom Resource to generate some friends.
AWS Custom Resource
It's a bit daunting at first to think I'm just learning CDK and I already want to go ahead and start creating custom resources, but actually they are pretty simple and straightforward to use. There are two strategies supported by CDK, Provider Framework and Custom Resources for AWS APIs.
Provider Framework lets me write my own custom lambda handler for resource lifecycle events while Custom Resources for AWS APIs lets me call AWS APIs during my deployment. This is going to be the simpler option so it's what I'll use in this article.
Fake Friends via Faker
I like using Faker to generate fake data. It has a lot of great options and is almost always good for a laugh. My plan is that I will use an AWS API to insert a fake friend record into the database I've just provisioned. To do that, I'll need a way to generate that data. In order to keep things simple, I'll just add a private method to my stack that knows how to do this.
import { commerce, name, random } from 'faker';
// now inside my stack constructor
private generateItem = () => {
return {
id: { S: random.uuid() },
firstName: { S: name.firstName() },
lastName: { S: name.lastName() },
shoeSize: { N: random.number({ max: 25, min: 1, precision: 0.1 }) },
favoriteColor: { S: commerce.color() },
};
};
Each attribute specifies the type, in this case S
for string and N
for number. If I were using mysql instead of DynamoDB, this would probably be a sql string.
My linter doesn't like the fact that the above method doesn't specify a return type and I like the idea of defining my data types so I'm going to create an interface.
interface IFriend {
id: { S: string };
firstName: { S: string };
lastName: { S: string };
shoeSize: { N: number };
favoriteColor: { S: string };
}
Note that the official TypeScript style guide says not to prefix your interface, but my linting rule expects it. I'm just not going to get into it right now.
Call the API
I'll use the AwsCustomResource
constructor to call the DynamoDB API. What CDK is going to do here is create a lambda function and use the SDK for JavaScript to make the call.
import { AwsCustomResource } from '@aws-cdk/custom-resources';
// inside constructor
new AwsCustomResource(this, 'initDBResource', {
onCreate: {
service: 'DynamoDB',
action: 'putItem',
parameters: {
TableName: tableName,
Item: this.generateItem(),
},
physicalResourceId: 'initDBData',
},
});
This code will create a lambda function that invokes the AWS JavaScript SDK. It will call putItem
on the DynamoDB
import and pass it my parameters. I can explore this API in the SDK docs, but unfortunately not in the CDK types as they are not narrow enough. Maybe some day.
Note that this creates a resource with the given ID and executes this API call when it's created. There are onUpdate
and onDelete
calls available too.
With the above code, I can npm run build
(or watch) and cdk deploy
and I'll find my table gets created and has a single friend in it.
Since I used onCreate
, the API call is only made on my first deploy - when the Custom Resource is created. If I changed that to onUpdate
, then I'd get a new one every time I deploy.
To break that down just a little more, when I npm run build
, that transpiles the TypeScript code into JavaScript. I now have JavaScript code that calls some faker methods and eventually produces a cloudformation template. If I'm putting programming structures like conditional statements and loops into my CDK code, it's really important to understand when those conditionals and loops will be evaluated, and that is when the template is generated.
Make it Go Faster!
Adding just one record on startup might work for some use cases, but what if that's just not enough data to be useful? DynamoDB has a batchWriteItem
method that might help. That lets me put 25 items into my table in a single API call. I'm going to add another private method that will help me generate data in batches of 25.
private generateBatch = (batchSize = 25): { PutRequest: { Item: IFriend } }[] => {
return new Array(batchSize).fill(undefined).map(() => {
return { PutRequest: { Item: this.generateItem() } };
});
};
Now I just need to swap putItem
with batchWriteItem
and update my parameters block to look like this:
parameters: {
RequestItems: {
[tableName]: this.generateBatch(),
},
},
batchWriteItem
allows writes to multiple tables, so the payload is just a little different - I specify the table per item I want to insert.
And Faster!
Now what if 25 items still aren't enough? I could put my resource in a loop.
for (let i = 0; i < 10; i++) {
new AwsCustomResource(this, `initDBResourceBatch${i}`, {
onCreate: {
service: 'DynamoDB',
action: 'batchWriteItem',
parameters: {
RequestItems: {
[tableName]: this.generateBatch(),
},
},
physicalResourceId: `initDBDataBatch${i}`,
},
});
}
This will generate 250 items. I could loop even more times, but eventually I will hit the limit of how large my cloudformation template can be. This technique can write hundreds of items, but likely not thousands and definitely not tens or hundreds of thousands.
Unlimited Data!
If I need to generate more than a few hundred items, I can use the Provider Framework and write my own lambda function to do exactly what I want. Maybe I'll give that a shot in a future post. For truly large amounts of data, I might need to start looking at Data Pipeline.
Next Steps
I wouldn't consider this example ready for wide use yet, but I've gained a pretty good understand of Custom Resources and their use. I think to get around template size limits, what I'd really want to do is upload some kind of csv or json payload to S3 and ingest that via lambda when I create my resources. I would also want to separate my concerns by publishing this as a separate construct or at least importing it into my main stack, not just adding private members to the class.
Hope this was helpful and informative. Would be glad to see others experiences with loading data via CDK or cloudformation (or even other means) in the comments!
Posted on January 6, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.