Limitations of AWS EC2 Image Builder Lifecycle Policies

mbacchi

Matt Bacchi

Posted on August 13, 2024

Limitations of AWS EC2 Image Builder Lifecycle Policies

EC2 Image Builder Lifecycle Policies is a fairly new AWS feature that enables automatic cleanup of old EC2 Image Builder pipeline artifacts. If you haven't used EC2 Image Builder, it's quite handy for creating customized AWS Machine Images (AMIs) for your EC2 deployments.

This could be a really valuable tool that simplifies the AMI build process. But there are a number of artifacts left behind after your AMI is built that potentially cost money and require manual effort to clean up.

Enter EC2 Image Builder Lifecycle Policies. This feature is advertised as a way to automatically tidy these artifacts. Unfortunately it's almost unusable if you build a large number of AMIs constructed from numerous recipes.

Below I'll describe a high frequency use case that illustrates the limitations of EC2 Image Builder Lifecycle Policies. You can make your own conclusions about the utility of the service as it stands today.

Standard use case

The process of building AWS EC2 Amazon Machine Images (AMIs) can be rather involved. You start with a base image, add required packages and user data, then build, test, and finally deploy your custom AMI. The deployment step isn't even straightforward, as it potentially requires disk volumes, network interfaces, and additional resources which necessitate experimentation. Even after you've done all this, managing the state of these AMIs and their related resources still isn't complete. These resources all need to be cleaned up. And because AMIs tend to be created frequently to meet security objectives, the entire process can occur at a high cadence.

Why so many AMIs?

We all know how frequently CVEs (Common Vulnerabilities and Exposures) can be released, and how quickly management expects them to be mitigated. Lower severity security updates come out at an even more unrelenting pace. For production infrastructure at enterprise scale, updates need to be routinely applied and be extremely reliable. This process is simplified by automating it all with an EC2 Image Builder pipeline executed on a regular schedule.

But after these AMIs and related resources have been superseded by newer ones they need to be removed. Until now there was no method provided by AWS to perform this action. You needed to develop your own tooling for this purpose.

EC2 Image Builder Lifecycle Policies

Lifecycle Policies adds a necessary (heretofore missing) feature to EC2 Image Builder. Having AWS remove your unused resources is a great idea. But the execution leaves something to be desired. For the sake of illustration, let's also define what "related resources" means here.

Every AMI built via EC2 Image Builder requires an image recipe and image pipeline be created. An image recipe consists of AWS managed components or custom (user generated) components to perform an AMI build. The pipeline runs your recipes which generate an output image upon success which references the output AMI, infrastructure configuration, distribution settings, and a CloudWatch log stream (which includes pipeline output for debugging purposes.)

When you modify the image recipe, a new recipe version is created and the previous versions become obsolete. The now unused recipe and it's related resources (such as the output image as well as the AMI) remains active in AWS indefinitely and won't be removed until you perform that action. (I've highlighted this specific point for a reason that will become obvious later.)

If you thought you merely needed to delete the AMI when you were done with it, you would leave all these related Image Builder resources in your account. (Granted they don't cost a lot, but it makes sense to remove unused resources, if only to avoid hitting quota limits at some point in the future.)

While EC2 Image Builder Lifecycle Policies makes this last cleanup bit somewhat more straightforward, it cannot be said it's "a one size fits all" solution.

Limitations of the current Lifecycle Policies service

To create a lifecycle policy for AMI image resources, the documentation says you must "apply lifecycle rules to image resources based on the recipe that created them, select up to 50 recipe versions for the policy."

This is unwieldy because you have to add new recipe versions explicitly each time you create a new image recipe. If you use Infrastructure as Code (IaC) tools to do this (such as Terraform, CloudFormation or CDK), it becomes slightly less manual, but still requires manual effort (or at least in depth awareness of the process.) But even if you add these recipes to your Lifecycle Policy manually, there's a limit of 50 recipe versions in the policy. This isn't as irritating if you only have a dozen recipes in total, the scope of the problem is intensified with scale.

Also, I'm referring to lifecycle policies for AMI image resources, I don't currently use EC2 Image Builder to create container image resources.

The limitations as I see them are:

  1. The maximum number of recipes allowed in the LifecyclePolicyResourceSelection API documentation which can be used as selection criteria is 50. This 50 recipe limit is quite an arbitrary number, and when a user has a large number of image recipes it is both an extremely small limit, and would become a burden to add all recipes to this rule scope parameter.

    I've verified this is not adjustable in the AWS Service Quotas self service quota increase dashboard in the AWS Console (or the Service Quotas API).

  2. In the EC2 Image Builder Lifecycle Policies AWS Console, when selecting the recipes to apply the Rule Scope, the drop down list only displays a few current recipes. It shows a spinner and indicates it is "Loading Recipes" (see screenshot) but the additional recipes never show up.

Rule Scope

Recommended Solutions (Feature Requests)

  1. An obvious solution to the first limitation above (the 50 recipe maximum problem) is allowing more than 50 recipes in the selection criteria Rule Scope.

    Of course, there's a much more user friendly and comprehensive approach. The most ergonomic proposal would be to allow a wildcard or pattern matching syntax. This would allow power users to identify their recipes in the Rule Scope with something that looks like the below json (the API documentation requires name/semanticVersion key value pairs). Otherwise it would mean still having to list every single version you've created that should be subject to the rule scope.

    {
        "name": "ubuntu-recipe",
        "semanticVersion": "*"
    }
    

    My rationale for these suggestions are as follows: If AWS only took the approach of allowing more than 50 recipes in the selection criteria Rule Scope, users would still have to add every single recipe version to the Rule Scope. Every single recipe version would have to be inserted in the LifecyclePolicyResourceSelection mentioned above. Allowing for a wildcard is much more approachable, as users would only need to add the recipe name and a wildcard to the selection criteria Rule Scope once, and never have to think about it again.

    This single feature would make EC2 Image Builder Lifecycle Policies much more useful and less of a burden for users who actively manage recipes and need a tool to clean these resources up (without resorting to writing something from scratch.)

  2. To resolve the AWS Console Rule Scope drop down problem, it seems like the first 6 results are returned in the request, but the pagination isn't working to return more than those 6 results. In fact, if you open your browser developer console, you can see that 6 results are returned from the ListImageRecipes request, but no more ever appear to be returned.

    Thankfully the AWS CLI command aws imagebuilder update-lifecycle-policy (and obviously the API) allows you to provide a list of recipes, but the AWS Console is typically the first exposure that users have to how a feature works. Fixing this issue would be helpful to users newly exposed to Lifecycle Policies.

Conclusion

While EC2 Image Builder Lifecycle Policies has the potential to be a great service, it is currently hampered by the user experience (UX.) Adding functionality to make adding recipes to policies easier would be a major improvement for early adopters and power users.

💖 💪 🙅 🚩
mbacchi
Matt Bacchi

Posted on August 13, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related