Camilo Reyes
Posted on October 25, 2023
In the first part of this series, we built the asynchronous components of a much larger solution. Today, we will build the API interface that can be consumed in real-time.
Recap
So far, we have set up two step functions to process resumes asynchronously. The first step function takes a resume from an S3 bucket and generates a downloadable URL. The second uses Textract to parse the uploaded resume and find the GitHub profile. Both functions take a few seconds to make this data available for consumption in an SQS queue.
What We’ll Cover In This Part
We will build an API with three endpoints:
- A POST endpoint to upload resumes in binary format.
- A GET endpoint to check for incoming messages in the SQS queue.
- Lastly, a DELETE endpoint to nuke old messages that a recruiter has already looked at.
In this take, we will build our API on top of the preexisting step functions. The API will mostly interact with the state machine, S3, and SQS.
Both the client-facing API and the step functions that run in the background are independent of each other: you can change one solution without changing the other, because step functions can run asynchronously and without blocking.
This models the real world more closely, as recruiters don't sit around waiting for candidates to upload resumes. They can simply check for incoming resumes as they are ready for consumption.
Ready? Let’s go!
Resume API: Deploy with Claudia.js
You should already have a main project folder with two step functions. Simply add a third folder with the resume API.
node-aws-step-functions
`-- look-for-github-profile-step
`-- upload-resume-step
`-- resume-uploader-api
Inside resume-uploader-api
, run npm init
to initialize a package.json
file. Make sure the name
property in the JSON matches the folder name. This is what sets the name of the Lambda function and the Gateway API when you deploy the project.
Since we are using Claudia.js to deploy the API, add this dependency to the project:
> npm i claudia@latest --save-dev
> npm i claudia-api-builder@latest --save
Open the package.json
file and specify these commands in the scripts
section:
"scripts": {
"start": "claudia create --region us-east-1 --no-optional-dependencies --api-module pub/bundle --runtime nodejs18.x --memory 1536 --arch arm64",
"deploy": "claudia update --no-optional-dependencies"
}
Here, we specify a 1.5GB allocation in the Lambda function and the ARM Graviton2 chip. Unless your app has specific code that must run on x86, AWS recommends using their custom ARM chip. This much memory allocation helps keep latencies low for the client app. Do not worry about how much memory the app uses; focus instead on CPU cores and perceived latencies for actual users.
Be sure to double-check your region and pick the one that is closest to you.
Add these S3, SQS, and SFN dependencies to the project:
> npm i @aws-sdk/client-s3@latest @aws-sdk/client-sfn@latest @aws-sdk/client-sqs@latest --save
Here is what each dependency is for:
-
@aws-sdk/client-s3
: S3 client that uploaded resumes go in -
@aws-sdk/client-sfn
: Step functions client to start the asynchronous process -
@aws-sdk/client-sqs
: SQS client to retrieve and purge messages
Next, create a web.js
file and add the following scaffolding:
const {
PutObjectCommand, // put uploaded resumes in S3
S3Client,
} = require("@aws-sdk/client-s3");
const {
SFNClient,
StartExecutionCommand, // start the state machine
} = require("@aws-sdk/client-sfn");
const {
SQSClient,
ReceiveMessageCommand, // receive SQS messages
PurgeQueueCommand, // purge SQS messeages in the queue
} = require("@aws-sdk/client-sqs");
const ApiBuilder = require("claudia-api-builder");
const api = new ApiBuilder();
// double check the region
const s3Client = new S3Client("us-east-1");
const sfnClient = new SFNClient("us-east-1");
const sqsClient = new SQSClient("us-east-1");
const s3BucketName = "<unique-bucket-name>";
const stateMachineArn = "<state-machine-arn>";
const queueUrl = "<sqs-queue-url>";
// Rest of the codes goes here
module.exports = api;
Assuming you have followed along since the previous post, you should already have an S3 bucket, a state machine, and an SQS queue URL. These values can be found in the AWS console. Double-check that your region is set correctly.
To find the state machine ARN, log in to the AWS console and go to 'Step Functions'. Then, click on your state machine and the ARN should be at the top of the page.
To find the SQS queue URL, go to 'Simple Queue Service' and click on your queue. The URL should be at the top of the page as well.
Add POST Endpoint to the API
In the web.js
file, add this POST endpoint to the API:
api.post(
"/",
async function (request) {
const storedFileName = request.queryString.fileName;
const fileContents = request.body; // binary format 'application/pdf'
const s3Command = new PutObjectCommand({
Bucket: s3BucketName,
Key: storedFileName,
Body: fileContents,
});
await s3Client.send(s3Command);
const sfnCommand = new StartExecutionCommand({
input: JSON.stringify({ storedFileName }),
stateMachineArn,
});
await sfnClient.send(sfnCommand);
},
{
success: { code: 204 },
}
);
This does two things:
- Uploads the resume from the request body into an S3 bucket.
- Kicks off the asynchronous process via the state machine.
Note that the API returns a 204 No Content
response immediately. This is what keeps perceived latencies low because the API does not wait on the state machine to complete. The async/await used here does not logically block the API.
GET Endpoint: Check for Messages
Add this to the same web.js
file:
api.get("/", async function () {
const command = new ReceiveMessageCommand({
QueueUrl: queueUrl,
});
const response = await sqsClient.send(command);
response.Messages =
response.Messages?.map((message) => JSON.parse(message.Body)) || []; // fallback to empty list
return response;
});
Processed resumes eventually end up in the SQS queue. Here, we check for messages in the response object. If there are no messages yet, then we return an empty array.
DELETE Endpoint: Remove Old Messages
Lastly, add this to the web.js
file:
api.delete(
"/",
async function () {
const command = new PurgeQueueCommand({
QueueUrl: queueUrl,
});
await sqsClient.send(command);
},
{
success: { code: 204 },
}
);
The purge command clears the SQS queue of any messages. This lets recruiters clear the queue once they are done reviewing incoming resumes.
All these API endpoints are designed to be fast and completely independent from the complex processing that happens in the background. Before we deploy this to AWS, let’s trim the bundle size so this executes even faster.
Deploy the API on AWS
To create an optimized bundle, install the webpack dependencies:
> npm i webpack@latest webpack-cli@latest --save-dev
Inside the resume-uploader-api
folder, create a webpack.config.js
file with the following:
const path = require("path");
module.exports = {
entry: [path.join(__dirname, "web.js")],
output: {
path: path.join(__dirname, "pub"),
filename: "bundle.js",
libraryTarget: "commonjs",
},
target: "node",
mode: "production",
};
Next, create an .npmignore
file to trim the final bundle zip file.
node_modules/
claudia.json
event.json
webpack.config.js
Lastly, open the package.json
file and change the dependencies
property to optionalDependencies
. This is what nukes all dependencies in the node_modules
folder so that they are not included in the output file.
With webpack in place, simply run npm start
to deploy the API on AWS. Once the deployment completes successfully, make a note of the url
property in the JSON output, because you will need this later.
Test the API
Be sure to double-check permissions. Find the resume-uploader-api-executor
role in the AWS console under IAM. Add AmazonS3FullAccess
, AmazonSQSFullAccess
, and AWSStepFunctionsFullAccess
permissions to this role.
Because the AWS Gateway already handles binary format, we can simply upload a resume using CURL. As long as we set the content type to application/pdf
, the gateway and our deploy tool will handle this automatically.
Now find the ExampleResume.pdf
used in the previous post. If you created one yourself, use that instead. Then upload a resume in CURL:
> curl -i -X POST -H "Content-Type: application/pdf" --data-binary "@ExampleResume.pdf" https://<GATEWAY-API-ID>.execute-api.<REGION>.amazonaws.com/latest?fileName=ExampleResume.pdf
This should respond immediately with a 204
HTTP status code. Next, fire another request to check the status of the SQS queue.
> curl -i -X GET -H "Accept: application/json" https://<GATEWAY-API-ID>.execute-api.<REGION>.amazonaws.com/latest
This should return an empty array depending on how fast you type. If this takes you more than a few seconds, the API will respond with a processed resume. Remember, the actual processing is happening in the background. Our API is meant for real-time consumption, so you can come back and check on the queue at any time.
You can clear the queue once you are done reviewing resumes.
> curl -i -X DELETE -H "Accept: application/json" https://<GATEWAY-API-ID>.execute-api.<REGION>.amazonaws.com/latest
If you somehow lose the URL with the GATEWAY-API-ID
and REGION
, log into the AWS console, go to the API Gateway, and click resume-uploader-api
. The Invoke URL can be found under 'Stages' (click on the latest).
Wrapping Up
In part one of this series, we deployed Lambda step functions in AWS using Claudia.js. We then built a state machine, deployed our step function to AWS, and tested it. In this part, we built the API interface.
Asynchronous background processing via step functions helps reduce the perceived latency of complex solutions. The API we put in place simply moves this complexity elsewhere so it does not get blocked and force actual users to wait.
Happy coding!
P.S. If you liked this post, subscribe to our JavaScript Sorcery list for a monthly deep dive into more magical JavaScript tips and tricks.
P.P.S. If you need an APM for your Node.js app, go and check out the AppSignal APM for Node.js.
Posted on October 25, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 28, 2024
November 15, 2024