Parallel Lambda execution with AWS State Machine
Ziad Osman
Posted on August 10, 2024
This is part 2 of a 2 part blog about real life scenarios that can be solved with AWS State Machine. In part 1, we discussed how to log CloudTrail events to DynamoDB by using a State Machine to write directly to DynamoDB. If you’re interested in that part, please refer to this link.
In this blog, we’re going to be using a State Machine to feed the output of one lambda function to the other. Furthermore, this output is going to be a list of objects. And, for each object, we’re going to trigger a lambda function in parallel. This is very useful when you have a scenario where you need to process a lot of objects at the same time.
Prerequisites
In addition to the state machine, I’m going to be creating two lambda functions. The first will be called OutputLambda, and the second will be called InputLambda.
We’re going to go back to the code of the lambda functions in a second, but first, let’s look out our State Machine.
1 - State machine
The state machine is going to have two Steps. The first step is the OutputLambda, which will trigger step two upon completion. The second step will be of type Map, which means that it will trigger once for each element in a list, in parallel.
Here is what our state machine definition looks like:
{
"Comment": "parallel execution of lambda functions demo",
"StartAt": "FirstState",
"States": {
"FirstState": {
"Type": "Task",
"Resource": "<OutputLambdaArn>",
"Next": "IterateOverList"
},
"IterateOverList": {
"Type": "Map",
"ItemsPath": "$.list",
"Iterator": {
"StartAt": "SecondState",
"States": {
"SecondState": {
"Type": "Task",
"Resource": "<InputLambdaArn>",
"End": true
}
}
},
"End": true
}
}
}
Let’s break down what we got:
The first step is very simple, as it specifies a lambda function as a resource and sets IterateOverList
as its next step.
IterateOverList
is defined as type Map, which is what we need to achieve the parallelism as specified before. It specifies the ItemsPath
as being named “list”. This means that OutputLambda
needs to return an object named “list” for it to be caught by this state (we'll get back to this in a second).
The Iterator section is where the magic happens. This is simply a foreach loop, that loops over the elements of our object named “list”, and triggers the task we defined for each element. Finally, inside the Iterator section, we have defined our task as being the InputLambda
.
Now, let’s move on to look at what our lambda functions look like.
2 - OutputLambda
This function only needs to return a list of objects. As defined in the ItemsPath
section of our state machine definition, the name of this list of objects should be “list”. Here is what the lambda function could look like, using Python3:
def lambda_handler(event, context):
fruits = ["apple","orange","pear"]
return {'list': fruits}
Next up, let's look at our InputLambda
.
3 - InputLambda
This lambda function will be triggered three times at the same time by our state machine (since in our example, the list has three items). Each instance of it triggering will have access to a different element of the list. Again, using Python3, here is how to retrieve that element in the lambda_handler.
def lambda_handler(event, context):
print(f"fruit of the day: {event}.")
Yep, as simple as that! The list element is available as the event.
Conclusion
In this blog, I showed you how you can use a state machine to chain multiple lambda functions together. We saw how we can feed the output of one lambda function as the input of another. We also saw how to trigger a lambda function multiple times in parallel, which is useful to process data in parallel at the same time.
Posted on August 10, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.