AWS Lambda SnapStart - Part 5 Measuring priming, end to end latency and deployment time with Java 11

vkazulkin

Vadym Kazulkin

Posted on January 16, 2023

AWS Lambda SnapStart - Part 5 Measuring priming, end to end latency and deployment time with Java 11

Introduction

In the first, second, third and fourth part of the series we talked about the SnapStart in general and made the first tests to compare the cold start of Lambda written in Plain Java with AWS SDK for Java version 2 and using Micronaut, Quarkus and Spring Boot Frameworks with and without SnapStart enabled. We saw that enabling SnapStart led to a huge decrease in the cold start times in all cases for our example application, but still these cold starts were quite noticeable. In this part of the series we are going to discuss further optimization techniques like Priming, measure end to end AWS API Gateway request latency with SnapStart enabled for the cold starts and explore how much additional time the Lambda deployment with SnapStart enabled takes.

Priming

Before we talk about priming technique let's summarize measured cold starts with SnapStart enabled and without using priming :

Framework p50 p90 p99
Pure Lambda 1266.05 1306.85 1326.81
Micronaut 1468.18 1595.61 1641.23
Quarkus 1337.16 1374.76 1473.87
Spring Boot 1222.52 1877.08 1879.78

The Java managed runtime uses the open-source Coordinated Restore at Checkpoint (CRaC) project to provide hook support. The managed Java runtime contains a customized CRaC context implementation that calls your Lambda function’s runtime hooks before completing snapshot creation and after restoring the execution environment from a snapshot.

SnapStart and runtime hooks give you new ways to build your Lambda functions for low startup latency. You can use the pre-snapshot hook to make your Java application as ready as possible for the first invoke. Do as much as possible within your function before the snapshot is taken. This is called priming.

Let's see how we can you priming. First we need to add the dependency to CRaC project to pom.xml

     <dependency>
         <groupId>io.github.crac</groupId>
         <artifactId>org-crac</artifactId>
         <version>0.1.3</version>
     </dependency>      

Enter fullscreen mode Exit fullscreen mode

We'll make a use of org.crac.Resource interface which provides 2 methods beforeCheckpoint and afterRestore. The first one before completing snapshot creation is a good place for implementing priming. In this method we make a call to retrieve product item from the DynamoDB with some static id (0 in this case).

productDao.getProduct("0");
Enter fullscreen mode Exit fullscreen mode

which calls itself the DynamoDB Client getItem method which
forces Jackson Marshallers to initialize which is quite expensive one time operation for the life cycle of the Lambda function. This product shouldn't necessarily be presented in the DynamoDB. The main goals of this call are the class loading and initialization steps. With that we intend to lower the cold start even more.

Let's see how it works for our scenarios individually. I added priming implementation for all described cases for my example application here.

1) Pure Java example

We directly implement org.crac.Resource interface in the Lambda function handler class (you should enable SnapStart on it) itself see and call productDao.getProduct("0") in the beforeCheckpoint method. It works out of the box before completing snapshot creation. You can add additional logging in the beforeCheckpoint method and find it in CloudWatch Logs during the deployment of your Lambda function.

2) Spring Boot example

It works the same way as for the Pure Java example, see.

3) Micronaut example

First we additionally need micronaut-crac dependency in pom.xml

    <dependency>
      <groupId>io.micronaut.crac</groupId>
      <artifactId>micronaut-crac</artifactId>
      <version>1.1.1</version>
      <scope>compile</scope>
    </dependency>
Enter fullscreen mode Exit fullscreen mode

Then we create the separate priming implementation for our example, as described in the official documentation.

This implementation basically does the same as we do in the pure Java example

@Singleton
public class ProductAPIResource implements io.micronaut.crac.OrderedResource  {

    private final ProductDao productDao;

    public ProductAPIResource(ProductDao productDao) {
        this.productDao = productDao;
      }

    @Override
    public void beforeCheckpoint(Context<? extends Resource> context) throws Exception {
        System.out.println("Before Checkpoint");
        productDao.getProduct("0");
    }

    @Override
    public void afterRestore(Context<? extends Resource> context) throws Exception {
        System.out.println("After Restore");    
    }
}
Enter fullscreen mode Exit fullscreen mode

The main difference is that the class implements io.micronaut.crac.OrderedResource interface which is CRaC implementation of Micronaut Framework and has @Singleton annotation on it. This together ensures that SnapStart uses the CRaC API to allow the application to execute custom code before the snapshotting or during the restoration.

4) Quarkus example

The CraC implementation is similar to Micronaut example as we also create the separate priming implementation, as decirbed in the official documentation.

This implementation looks like this

@Startup
@ApplicationScoped
public class ProductAPIResource implements org.crac.Resource {

    private static final ProductDao productDao = new DynamoProductDao();

    @PostConstruct
    public void init () {
        Core.getGlobalContext().register(this);
    }

    @Override
    public void beforeCheckpoint(org.crac.Context<? extends Resource> context) throws Exception {
        System.out.println("Before Checkpoint");
        productDao.getProduct("0");
    }

    @Override
    public void afterRestore(org.crac.Context<? extends Resource> context) throws Exception {
        System.out.println("After Restore");    
    }
}
Enter fullscreen mode Exit fullscreen mode

With 2 annotations Startup and ApplicationScoped we ensure that SnapStart uses the CRaC API to allow the application to execute custom code before the snapshotting or during the restoration.

Now as we explored how to add priming to all our scenarios let's measure cold starts time after 100 invocations each. We got the following results

Framework p50 p90 p99
Pure Lambda 352.45 401.43 433.76
Micronaut 597.91 732.01 755.53
Quarkus 459.24 493.33 510.32
Spring Boot 600.66 1065.37 1173.93

We see a huge decrease of the cold starts for all scenarios up to 900 miliseconds. Even if the effect of priming may vary from scenario to scenario (when using DynamoDB we'll see one of the biggest possible optimizations), it's one of the must-have optimization techniques to be considered. Also with priming we achieved the cold starts which are currently comparable or even lower as with GraalVM Native Image and generally look very promising that they won't impact your public facing applications that much.

Measuring end to end AWS API Gateway latency

Measuring the cold start times for AWS Lambda with SnapStart enabled is a one thing, but it's more useful to see the full picture and therefore to measure the full end to end AWS API Gateway request latency.

Here are the results that I got for 100 requests that produced the cold start for each scenario with SnapStart enabled and with priming

Framework p50 p90 p99
Pure Lambda 877 1090 1098
Micronaut 1083 1221 1570
Quarkus 946 1094 1243
Spring Boot 1068 2021 2222

Measuring additional deployment time for the SnapStart enabled Lambda function

It's logical that each new deployment of the Lambda function with SnapStart enabled takes longer then without enabling SnapStart because of the snapshotting and possibly pre-snapshot hook execution. We'd like to measure how long it takes additionaly. I have only run my experiments with sam deploy (without the hot deployment). I excluded the time to upload the source code which is size and framework dependend and only focused on the deployment itself, so the difference between the pure Java and frameworks used (Micronat, Quarkus or Spring Boot) is negligible.

  • Without SnapStart and without usage of version and alias in the AWS SAM template, deployment took approx. 31 seconds
  • Without SnapStart and with initial creating of version and alias in the AWS SAM template, deployment took approx. 1 minute.
  • Without SnapStart and with creating newer version and modifying the existing alias in the AWS SAM template, deployment took approx. 41 seconds.

Then I enabled SnapStart on 1 Lambda function and that's why I was required to use version and alias (AutoPublishAlias: liveVersion).

  • With SnapStart and with inital creating of version and alias in the AWS SAM template, deployment took approx. 3 minutes.
  • With SnapStart and with creating newer version and modifying the existing alias in the AWS SAM template, deployment took approx. 2 minutes and 40 seconds.

So we observe that enabling SnapStart on the 1 Lambda functions leads to increasing the deployment time by 2 or even more minutes depending on the scenario (creating or modifying the alias).

When we re-run the experiment with 2 Lambda functions, we observe that the deployment time increased only by several seconds in all scenarios, as SAM deployes all Lambda functions in parallel.

Conclusions and next steps

In this part of the series we discussed further optimization techniques like Priming and observed that it significantly reduced the cold start time further, that's why we consider this technique as a must-have, even if it means that we need to slightly modify the code for it. We also measured end to end AWS API Gateway request latency with and SnapStart and priming enabled for the cold starts to see the overall result. In the end we explored how much additional time the Lambda deployment with SnapStart enabled takes and saw that it addes 2 minutes and even more. Here is a lot of room for improvement to provide more smoother developer experience.

In the next part of the series we'll be looking at scenarios like using other AWS services like SQS an SNS to see how enabling SnapStart and priming effect them. Also we're also going test our application with the steady traffic to see if it has any affect on decreasing the resume times and therefore the cold starts.

💖 💪 🙅 🚩
vkazulkin
Vadym Kazulkin

Posted on January 16, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related