Parallel Lambda execution with AWS State Machine

This is part 2 of a 2 part blog about real life scenarios that can be solved with AWS State Machine. In part 1, we discussed how to log CloudTrail events to DynamoDB by using a State Machine to write directly to DynamoDB. If you’re interested in that part, please refer to this link.

In this blog, we’re going to be using a State Machine to feed the output of one lambda function to the other. Furthermore, this output is going to be a list of objects. And, for each object, we’re going to trigger a lambda function in parallel. This is very useful when you have a scenario where you need to process a lot of objects at the same time.

Prerequisites

In addition to the state machine, I’m going to be creating two lambda functions. The first will be called OutputLambda, and the second will be called InputLambda.

We’re going to go back to the code of the lambda functions in a second, but first, let’s look out our State Machine.

1 - State machine

The state machine is going to have two Steps. The first step is the OutputLambda, which will trigger step two upon completion. The second step will be of type Map, which means that it will trigger once for each element in a list, in parallel.

Here is what our state machine definition looks like:

{
  "Comment": "parallel execution of lambda functions demo",
  "StartAt": "FirstState",
  "States": {
    "FirstState": {
      "Type": "Task",
      "Resource": "<OutputLambdaArn>",
      "Next": "IterateOverList"
    },
    "IterateOverList": {
      "Type": "Map",
      "ItemsPath": "$.list",
      "Iterator": {
        "StartAt": "SecondState",
        "States": {
          "SecondState": {
            "Type": "Task",
            "Resource": "<InputLambdaArn>",
            "End": true
          }
        }
      },
      "End": true
    }
  }
}

Let’s break down what we got:

The first step is very simple, as it specifies a lambda function as a resource and sets IterateOverList as its next step.

IterateOverList is defined as type Map, which is what we need to achieve the parallelism as specified before. It specifies the ItemsPath as being named “list”. This means that OutputLambda needs to return an object named “list” for it to be caught by this state (we'll get back to this in a second).

The Iterator section is where the magic happens. This is simply a foreach loop, that loops over the elements of our object named “list”, and triggers the task we defined for each element. Finally, inside the Iterator section, we have defined our task as being the InputLambda.

Now, let’s move on to look at what our lambda functions look like.

2 - OutputLambda

This function only needs to return a list of objects. As defined in the ItemsPath section of our state machine definition, the name of this list of objects should be “list”. Here is what the lambda function could look like, using Python3:

def lambda_handler(event, context):
    fruits = ["apple","orange","pear"]
    return {'list': fruits}

Next up, let's look at our InputLambda.

3 - InputLambda

This lambda function will be triggered three times at the same time by our state machine (since in our example, the list has three items). Each instance of it triggering will have access to a different element of the list. Again, using Python3, here is how to retrieve that element in the lambda_handler.

def lambda_handler(event, context):
    print(f"fruit of the day: {event}.")

Yep, as simple as that! The list element is available as the event.

Conclusion

In this blog, I showed you how you can use a state machine to chain multiple lambda functions together. We saw how we can feed the output of one lambda function as the input of another. We also saw how to trigger a lambda function multiple times in parallel, which is useful to process data in parallel at the same time.

Blog

Parallel Lambda execution with AWS State Machine

Ziad Osman

Prerequisites

1 - State machine

2 - OutputLambda

3 - InputLambda

Conclusion

Join Our Newsletter. No Spam, Only the good stuff.

Related