AWS Step Functions with Node.js: Build an API

beautifulcoder

Camilo Reyes

Posted on October 25, 2023

AWS Step Functions with Node.js: Build an API

In the first part of this series, we built the asynchronous components of a much larger solution. Today, we will build the API interface that can be consumed in real-time.

Recap

So far, we have set up two step functions to process resumes asynchronously. The first step function takes a resume from an S3 bucket and generates a downloadable URL. The second uses Textract to parse the uploaded resume and find the GitHub profile. Both functions take a few seconds to make this data available for consumption in an SQS queue.

What We’ll Cover In This Part

We will build an API with three endpoints:

  • A POST endpoint to upload resumes in binary format.
  • A GET endpoint to check for incoming messages in the SQS queue.
  • Lastly, a DELETE endpoint to nuke old messages that a recruiter has already looked at.

In this take, we will build our API on top of the preexisting step functions. The API will mostly interact with the state machine, S3, and SQS.

Both the client-facing API and the step functions that run in the background are independent of each other: you can change one solution without changing the other, because step functions can run asynchronously and without blocking.

This models the real world more closely, as recruiters don't sit around waiting for candidates to upload resumes. They can simply check for incoming resumes as they are ready for consumption.

Ready? Let’s go!

Resume API: Deploy with Claudia.js

You should already have a main project folder with two step functions. Simply add a third folder with the resume API.

node-aws-step-functions
`-- look-for-github-profile-step
`-- upload-resume-step
`-- resume-uploader-api
Enter fullscreen mode Exit fullscreen mode

Inside resume-uploader-api, run npm init to initialize a package.json file. Make sure the name property in the JSON matches the folder name. This is what sets the name of the Lambda function and the Gateway API when you deploy the project.

Since we are using Claudia.js to deploy the API, add this dependency to the project:

> npm i claudia@latest --save-dev
> npm i claudia-api-builder@latest --save
Enter fullscreen mode Exit fullscreen mode

Open the package.json file and specify these commands in the scripts section:

"scripts": {
  "start": "claudia create --region us-east-1 --no-optional-dependencies --api-module pub/bundle --runtime nodejs18.x --memory 1536 --arch arm64",
  "deploy": "claudia update --no-optional-dependencies"
}
Enter fullscreen mode Exit fullscreen mode

Here, we specify a 1.5GB allocation in the Lambda function and the ARM Graviton2 chip. Unless your app has specific code that must run on x86, AWS recommends using their custom ARM chip. This much memory allocation helps keep latencies low for the client app. Do not worry about how much memory the app uses; focus instead on CPU cores and perceived latencies for actual users.

Be sure to double-check your region and pick the one that is closest to you.

Add these S3, SQS, and SFN dependencies to the project:

> npm i @aws-sdk/client-s3@latest @aws-sdk/client-sfn@latest @aws-sdk/client-sqs@latest --save
Enter fullscreen mode Exit fullscreen mode

Here is what each dependency is for:

  • @aws-sdk/client-s3: S3 client that uploaded resumes go in
  • @aws-sdk/client-sfn: Step functions client to start the asynchronous process
  • @aws-sdk/client-sqs: SQS client to retrieve and purge messages

Next, create a web.js file and add the following scaffolding:

const {
  PutObjectCommand, // put uploaded resumes in S3
  S3Client,
} = require("@aws-sdk/client-s3");
const {
  SFNClient,
  StartExecutionCommand, // start the state machine
} = require("@aws-sdk/client-sfn");
const {
  SQSClient,
  ReceiveMessageCommand, // receive SQS messages
  PurgeQueueCommand, // purge SQS messeages in the queue
} = require("@aws-sdk/client-sqs");
const ApiBuilder = require("claudia-api-builder");

const api = new ApiBuilder();

// double check the region
const s3Client = new S3Client("us-east-1");
const sfnClient = new SFNClient("us-east-1");
const sqsClient = new SQSClient("us-east-1");

const s3BucketName = "<unique-bucket-name>";
const stateMachineArn = "<state-machine-arn>";
const queueUrl = "<sqs-queue-url>";

// Rest of the codes goes here

module.exports = api;
Enter fullscreen mode Exit fullscreen mode

Assuming you have followed along since the previous post, you should already have an S3 bucket, a state machine, and an SQS queue URL. These values can be found in the AWS console. Double-check that your region is set correctly.

To find the state machine ARN, log in to the AWS console and go to 'Step Functions'. Then, click on your state machine and the ARN should be at the top of the page.

To find the SQS queue URL, go to 'Simple Queue Service' and click on your queue. The URL should be at the top of the page as well.

Add POST Endpoint to the API

In the web.js file, add this POST endpoint to the API:

api.post(
  "/",
  async function (request) {
    const storedFileName = request.queryString.fileName;
    const fileContents = request.body; // binary format 'application/pdf'

    const s3Command = new PutObjectCommand({
      Bucket: s3BucketName,
      Key: storedFileName,
      Body: fileContents,
    });

    await s3Client.send(s3Command);

    const sfnCommand = new StartExecutionCommand({
      input: JSON.stringify({ storedFileName }),
      stateMachineArn,
    });

    await sfnClient.send(sfnCommand);
  },
  {
    success: { code: 204 },
  }
);
Enter fullscreen mode Exit fullscreen mode

This does two things:

  • Uploads the resume from the request body into an S3 bucket.
  • Kicks off the asynchronous process via the state machine.

Note that the API returns a 204 No Content response immediately. This is what keeps perceived latencies low because the API does not wait on the state machine to complete. The async/await used here does not logically block the API.

GET Endpoint: Check for Messages

Add this to the same web.js file:

api.get("/", async function () {
  const command = new ReceiveMessageCommand({
    QueueUrl: queueUrl,
  });

  const response = await sqsClient.send(command);

  response.Messages =
    response.Messages?.map((message) => JSON.parse(message.Body)) || []; // fallback to empty list

  return response;
});
Enter fullscreen mode Exit fullscreen mode

Processed resumes eventually end up in the SQS queue. Here, we check for messages in the response object. If there are no messages yet, then we return an empty array.

DELETE Endpoint: Remove Old Messages

Lastly, add this to the web.js file:

api.delete(
  "/",
  async function () {
    const command = new PurgeQueueCommand({
      QueueUrl: queueUrl,
    });

    await sqsClient.send(command);
  },
  {
    success: { code: 204 },
  }
);
Enter fullscreen mode Exit fullscreen mode

The purge command clears the SQS queue of any messages. This lets recruiters clear the queue once they are done reviewing incoming resumes.

All these API endpoints are designed to be fast and completely independent from the complex processing that happens in the background. Before we deploy this to AWS, let’s trim the bundle size so this executes even faster.

Deploy the API on AWS

To create an optimized bundle, install the webpack dependencies:

> npm i webpack@latest webpack-cli@latest --save-dev
Enter fullscreen mode Exit fullscreen mode

Inside the resume-uploader-api folder, create a webpack.config.js file with the following:

const path = require("path");

module.exports = {
  entry: [path.join(__dirname, "web.js")],
  output: {
    path: path.join(__dirname, "pub"),
    filename: "bundle.js",
    libraryTarget: "commonjs",
  },
  target: "node",
  mode: "production",
};
Enter fullscreen mode Exit fullscreen mode

Next, create an .npmignore file to trim the final bundle zip file.

node_modules/
claudia.json
event.json
webpack.config.js
Enter fullscreen mode Exit fullscreen mode

Lastly, open the package.json file and change the dependencies property to optionalDependencies. This is what nukes all dependencies in the node_modules folder so that they are not included in the output file.

With webpack in place, simply run npm start to deploy the API on AWS. Once the deployment completes successfully, make a note of the url property in the JSON output, because you will need this later.

Test the API

Be sure to double-check permissions. Find the resume-uploader-api-executor role in the AWS console under IAM. Add AmazonS3FullAccess, AmazonSQSFullAccess, and AWSStepFunctionsFullAccess permissions to this role.

Resume API Permissions

Because the AWS Gateway already handles binary format, we can simply upload a resume using CURL. As long as we set the content type to application/pdf, the gateway and our deploy tool will handle this automatically.

Now find the ExampleResume.pdf used in the previous post. If you created one yourself, use that instead. Then upload a resume in CURL:

> curl -i -X POST -H "Content-Type: application/pdf" --data-binary "@ExampleResume.pdf" https://<GATEWAY-API-ID>.execute-api.<REGION>.amazonaws.com/latest?fileName=ExampleResume.pdf
Enter fullscreen mode Exit fullscreen mode

This should respond immediately with a 204 HTTP status code. Next, fire another request to check the status of the SQS queue.

> curl -i -X GET -H "Accept: application/json" https://<GATEWAY-API-ID>.execute-api.<REGION>.amazonaws.com/latest
Enter fullscreen mode Exit fullscreen mode

This should return an empty array depending on how fast you type. If this takes you more than a few seconds, the API will respond with a processed resume. Remember, the actual processing is happening in the background. Our API is meant for real-time consumption, so you can come back and check on the queue at any time.

You can clear the queue once you are done reviewing resumes.

> curl -i -X DELETE -H "Accept: application/json" https://<GATEWAY-API-ID>.execute-api.<REGION>.amazonaws.com/latest
Enter fullscreen mode Exit fullscreen mode

If you somehow lose the URL with the GATEWAY-API-ID and REGION, log into the AWS console, go to the API Gateway, and click resume-uploader-api. The Invoke URL can be found under 'Stages' (click on the latest).

Wrapping Up

In part one of this series, we deployed Lambda step functions in AWS using Claudia.js. We then built a state machine, deployed our step function to AWS, and tested it. In this part, we built the API interface.

Asynchronous background processing via step functions helps reduce the perceived latency of complex solutions. The API we put in place simply moves this complexity elsewhere so it does not get blocked and force actual users to wait.

Happy coding!

P.S. If you liked this post, subscribe to our JavaScript Sorcery list for a monthly deep dive into more magical JavaScript tips and tricks.

P.P.S. If you need an APM for your Node.js app, go and check out the AppSignal APM for Node.js.

💖 💪 🙅 🚩
beautifulcoder
Camilo Reyes

Posted on October 25, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related