Managing AWS Lambda Versions with AWS Step Functions: A comprehensive guide

maureen_plank

Maureen Plank

Posted on April 19, 2024

Managing AWS Lambda Versions with AWS Step Functions: A comprehensive guide

Each AWS account has a regional default storage limit for Lambda and Lambda Layers (Zip archives) of 75 GB. This sounds a lot, and it can even be increased by demand, but many functions combined with frequent deployments might lead to storage pollution quite quickly. Therefore, it makes sense to clean up old deployment packages regularly. This blogpost introduces an AWS Step Functions state machine that is designed to automate the identification and deletion of older Lambda function versions.

Starting point

AWS creates a new version for each lambda function deployment and keeps them all available – just in case an older one is needed in future. Every deployment package is stored and counts to the overall size limit of 75 GB. Even though this amount of storage is quite large it is advisable (and recommended by AWS) to clean up older versions which are no longer needed on a regular basis.

As of now, AWS does not provide a direct solution to manage these versions, prompting developers to create custom solutions. One example of a custom solution is the use of a Lambda function like the Lambda Janitor by Yan Cui. Our idea for an implementation was a state machine – an automated and visual solution for the Lambda version management.

This state machine handles all Lambda functions (optionally targeting specific ones based on predefined criteria) and deletes / removes old versions based on threshold which determines how many versions should be kept.
A visual representation of the final workflow as well as a brief explanation of the most important steps is given below:

Image description

Retrieving and filtering the function names

The workflow begins with calling and listing the names of all existing Lambda functions within the AWS region. The “lambda:listFunctions” API call returns up to 50 items per call and a marker token in case more results are available. To take this into account the paginator pattern after the “Iterate Functions” map state is being used.

For the first step we transform the result with the ResultSelector to extract only the function names. By doing that we get an output like this:

Image description

To iterate through the array of lambda function names in the list, we use a Map state, which allows us to set a concurrency limit of 5. This limit helps to control the number of simultaneous requests and prevents throttle calls.

In our case we just want to clean up versions of Lambda functions which belong to our project. Others which are managed by the platform team need to stay untouched.

The Lambda functions from our project have a specific prefix in the name, that makes it easy for us to filter through all functions in the region. To do that, we added a “Choice” step to determine whether the state machine should continue the version cleanup or ignore the current Lambda function. This prefix can also be retrieved from the state machine invocation event.

Image description

Getting all the available versions

In the following step, all available versions are retrieved using the “lambda:listVersionsByFunction” API call. The outcome of this step is shown below – a list of all versions and the name of the function currently processed.

Image description

Keeping the function name in this step and adding it to the ResultPath is mandatory because we will need it again in a few steps when deleting the versions.

The “lambda:listVersionsByFunction” API call returns up to 50 items and a marker token to retrieve more results. To take care of the pagination – the paginator pattern would have been required again. However, we have decided not to implement it to keep the workflow simpler.

There are not that many deployments in our environment that we would reach more than 50 versions per Lambda function before the workflow runs the next time. In addition, the workflow which is triggered by an AWS EventBridge scheduler could run more often to avoid this situation – for instance, every day instead of every week – just as an example.

Removing the $LATEST version and get version count

The Lambda API returns the versions in a way that the latest (called $LATEST 😊) one comes first followed by all other versions in ascending order – from oldest to newest. We decided not to rely on this implicit order but to exclude the $LATEST version explicitly from all further processing steps as well as to sort the remaining version numbers – just in case AWS is going to change the response format.

Unfortunately, AWS Step Functions does only provide a limited set of array and JSON processing functions – so called intrinsic functions. A Lambda function is required to perform the necessary work. Hopefully, AWS adds some more power to the intrinsic function’s palette so that these kind of simple helper functions will no longer be necessary in future.

Image description

The outcome of this step consists of the lambda function name, the array of versions to be processed and their count.

Check number of available versions and remove outdated ones

In the next step, the state machine compares the number of available versions against a threshold. This is done with a “Choice” State. In our case we wanted to always keep the three most recent versions + the $LATEST one, so the “version_count” is compared with 3:

Image description

If the number of available versions exceeds the threshold we set, the oldest version (first position in the array) is deleted using the “lambda: deleteFunction” API call in the step “Delete oldest version”.

After that, a modification of the payload is required as the deleted version number and the version count itself needs to be adapted. A “Pass” state is used to remove the oldest version (the first position in the array with the versions) and to decrease the version counter.

Image description

The execution keeps checking and deleting old versions until their number has reached the threshold value (three in this example).

Image description

One remaining thing to consider related to Lambda Aliases

A Lambda alias reference one or two function versions which cannot be deleted if the alias exists. It is possible to retrieve all aliases of a function via the “lambda:listAliases” API call. One way to make sure to take aliases into account would be to get all involved versions and to remove them from the versions array before the version deletion action. This requires some custom code in a Lambda function.

Another option – which we have implemented – consists in defining an error handler for the “Delete old version” step. This one captures the “Lambda.ResourceConflictException” which is thrown when trying to delete a version which cannot be deleted for instance because of an alias reference. This error handling makes sure that the state machine does not fail in case of alias references or other unforeseen problems when trying to perform the deletion.

Image description

The state machine finishes after all Lambda functions which have been supposed to be processed do not possess more than the most recent versions – all others have been deleted. This mechanism helps to prevent deployment package pollution when used regularly.

Conclusion

In this guide on managing AWS Lambda versions with AWS Step Functions, I’ve shown how developers can leverage automation to streamline the management of Lambda function versions. By automating these processes, teams can focus more on development rather than maintenance, maintaining operational efficiency and resource optimization. Overall, this solution provides a practical, visual tool for managing Lambda functions effectively, helping users navigate the complexities of cloud operations with greater ease.

💖 💪 🙅 🚩
maureen_plank
Maureen Plank

Posted on April 19, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related