Storage-First pattern in AWS with API Gateway, Part 1: using S3

robbulmer

Rob Bulmer

Posted on January 25, 2023

Storage-First pattern in AWS with API Gateway, Part 1: using S3

Checkout the original post on Medium here:
https://medium.com/@robertbulmer/storage-first-pattern-in-aws-with-api-gateway-part-1-using-s3-216e20b08353

Using S3 to capture each request

👋 Introduction

This post explains the Storage-First pattern when designing your serverless architecture and where this can come in useful to create additional Resiliency and High-availability in your serverless workloads.

This example is useful in conjunction with serverless Event-Driven Architecture (EDA). The requester will receive an immediate HTTP 200 API Gateway response as soon as the request has been saved. If the requester requires confirmation, consider raising an event to post back, polling or even web sockets.

đŸȘŁ Storage-First

When we say Storage-First we mean capturing the entire incoming request into your API, reliably using AWS managed services without the need for your API validate, parse or transform the request.

By storing the request, we have an exact copy in case the handler encounters an unexpected issue. Potential issues could be:

  • Failure to parse the request
  • Failure to save the processed request to a data store
  • Failure to pass on the request to another downstream or third party service

We can use a number of AWS managed services with the Storage-First pattern, which will also be covered in my other posts. These aren’t limited to but include S3, SQS, DynamoDB, EventBridge. Each service has their own use cases, quotas and limitations and those mentioned above have direct integrations with AWS API Gateway.

🔎 Let’s look at a typical request

Standard API request through API Gateway

Typically, a simple API request is sent via a service to our API gateway with an attached Lambda handler.

Let’s see what could go wrong:

Lambda failure with API Gateway

Although AWS Lambda has high availability built in, you’re not protected against any runtime issues with downstream services, or problems like bugs introduced in latest releases.

If an unexpected problem occurs, now you’ve completely lost the request. You may have added appropriate logging to investigate and recover, but this doesn’t help in all situations.

Additionally, say your request came from a third party vendor, or is a legacy system that ‘fires and forgets’ the request with no ability to retry. then you’ve lost the request forever.

Losing the request — Inability to resend if a runtime error occurs

So what can we do to recover from this scenario?

Let’s store the request in AWS S3, and we can re-drive the request if required without extending functionality on the sender, which may not be possible if:

You don’t own the system, it belongs to a third party vendor
The sender is a legacy application that cannot support or cannot be extended to retry

Enabling re-drive with S3 Storage-First

Now in an error scenario we can pick up the request data from the S3 bucket and process it again, exactly as it came in.

🚀 Real Scenario

The Storage-First pattern comes nicely into place when dealing with requests that contain large data payloads from third party or legacy applications that do not have high resiliency and the ability to protect against failures.

Take for instance an application that sends product XML data to our API.

Product XML Service

The product XML has the potential to be varied and vast. For this reason I have chosen AWS S3 as the Storage-First solution, we can take any amount of data and store it in S3 without any issues. If we used SQS, we are limited to a payload of 256kb, read more on SQS quotas here.

Above we have two API endpoints:

1) PUT /product

2) PUT /product/{bucket}/{key}

1) Put /Product

This endpoint uploads XML data using a specific bucket created in our AW CDK stack. The RequestId that is automatically generated by API Gateway is used as the object name inside the bucket.

Let’s see method 1 in action by submitting our raw XML data to the endpoint:

curl --location --request PUT 'https://{apigwid}.execute-api.eu-west-1.amazonaws.com/prod/product/' \
--header 'x-api-key: {apiKey}' \
--header 'Content-Type: application/xml' \
--data-raw '<Product>
<AssetCrossReference Type="Primary Image"/>
<AssetCrossReference Type="Image 02"/>
</Product>'

Now let’s check our S3 Bucket:

Request saved directly in our bucket with the auto-generated RequestId

Let’s have a look at the CDK code:

// Create new Integration method
const putObjectIntegrationAutoName: AwsIntegration = new AwsIntegration({
service: "s3",
region: "eu-west-1",
integrationHttpMethod: "PUT",
path: "{bucket}/{object}",
options: {
credentialsRole: this.apiGatewayRole,
// Passes the request body to S3 without transformation
passthroughBehavior: PassthroughBehavior.WHEN_NO_MATCH,
requestParameters: {
// Specify the bucket name from the XML bucket we created above
"integration.request.path.bucket": '${targetBucket.bucketName}',
// Specify the object name using the APIG context requestId
"integration.request.path.object": "context.requestId",
"integration.request.header.Accept": "method.request.header.Accept",
},
// Return a 200 response after saving to S3
integrationResponses: [
{
statusCode: "200",
responseParameters: {
"method.response.header.Content-Type":
"integration.response.header.Content-Type",
},
},
],
},
});

// Create the endpoint method options
const putObjectMethodOptionsAutoName: MethodOptions = {
// Protected by API Key
authorizationType: AuthorizationType.NONE,
// Require the API Key on all requests
apiKeyRequired: true,
requestParameters: {
"method.request.header.Accept": true,
"method.request.header.Content-Type": true,
},
methodResponses: [
{
statusCode: "200",
responseParameters: {
"method.response.header.Content-Type": true,
},
},
],
};

// assign the integration to /product resource
productResource.addMethod(
"PUT",
putObjectIntegrationAutoName,
putObjectMethodOptionsAutoName
);

2) Put /Product/{bucket}/{key}

This endpoint uploads XML data to a user specified S3 bucket and object name. This allows the requester to choose where to put the data and what to call the file.

Note: In this example, our CDK stack has a single bucket and the Lambda permissions are restricted to that bucket only.

Let’s see what happens when we post to method 2:

curl --location --request PUT 'https://{apigwid}.execute-api.eu-west-1.amazonaws.com/prod/product/{bucketName}/p1234' \
--header 'Content-Type: application/xml' \
--header 'x-api-key: {apikey}' \
--data-raw '<Product>
<AssetCrossReference Type="Primary Image"/>
<AssetCrossReference Type="Image 02"/>
</Product>'

Now check the S3 bucket for our new file “p1234” that we specified in the above request:

Request saved directly in our bucket with the user specified object name

Now let’s see the differences in AWS CDK when using the request path parameters to decide where to store the object:

// Create the new integration method
const putObjectIntegrationUserSpecified: AwsIntegration =
new AwsIntegration({
...
options: {
...
requestParameters: {
// use the bucket name in the request path
"integration.request.path.bucket": "method.request.path.bucketName",
// use the object key in the request path
"integration.request.path.object": "method.request.path.objectKey",
"integration.request.header.Accept": "method.request.header.Accept",
},
...
},
});

For completeness, let’s check our S3 Trigger to the Parser Lambda function is working by checking AWS CloudWatch. The Parser function will read the XML data and convert this to a JS Object.

XML being parsed into a JS Object in Lambda using S3 Triggers

☑Summary

As we have seen we can capture request data and store this directly in AWS S3 using AWS direct integrations with API Gateway. This provides a highly resilient solution to managing data requests and allows for re-driving requests if there are any unexpected errors.

With a small amount of CDK code we can utilise AWS managed services to provide a robust solution that is great for requesters who cannot resend requests or those that you may not have access to or support channels for.

You can extend the above code with a full EDA solution that can notify the requester the request has been fully processed. Alternatively if you cannot use EDA within your end to end approach, consider using web sockets or polling in conjunction to notify the sender once complete.

đŸ‘šâ€đŸ’» Code

All of the code featured in this post can be found here:
https://github.com/rbulmer55/Apigw-to-s3

📣 Getting in touch!

Happy building!
 🚀

Thank you for reading,

Reach me on linkedIn here:

https://www.linkedin.com/in/robertbulmer/

💖 đŸ’Ș 🙅 đŸš©
robbulmer
Rob Bulmer

Posted on January 25, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related