AWS SnapStart - Part 26 Measuring cold and warm starts with Java 21 using different garbage collection algorithms
Vadym Kazulkin
Posted on September 23, 2024
Introduction
In the previous parts of our series, we measured the cold starts of the Lambda function with Java 21 runtime without SnapStart enabled, with SnapStart enabled and also applied DynamoDB invocation priming optimization with different Lambda memory settings, Lambda deployment artifact sizes, Java compilation options, (a)synchronous HTTP clients and the usage of different Lambda layers. For all these measurements we used default garbage collection algorithms G1.
In this article we'd like to explore the impact of Java garbage collection algorithms on the performance of the Lambda function with Java 21 runtime. We'll also re-measure everything for the G1 to have comparable results with the same minor Java 21 version in use for all garbage collection algorithms.
Java Garbage collection algorithms
For our measurements we'll use the following Java collection algorithms with their default setting (please refer to the linked documentation for more detailed information about each algorithm):
- Garbage-First (G1) Garbage Collector. This is the garbage collection algorithm used by default. You can set it explicitly in AWS SAM template by adding -XX:+UseG1GC to the JAVA_TOOL_OPTIONS environment variable.
- The Parallel Collector. You can set it explicitly in AWS SAM template by adding -XX:+UseParallelGC to the JAVA_TOOL_OPTIONS environment variable.
- Shenandoah GC. Oracle JDK doesn't provide it, but Amazon Corretto 21 JDK does. You can set it explicitly in AWS SAM template by adding -XX:+UseShenandoahGC to the JAVA_TOOL_OPTIONS environment variable.
- The Z Garbage Collector. There are 2 different ZGC algorithms: default and the newer one- generational. You can set it explicitly in AWS SAM template by adding -XX:+UseZGC or -XX:+UseZGC -XX:+ZGenerational to the JAVA_TOOL_OPTIONS environment variable.
Measuring cold and warm starts with Java 21 using different garbage collection algorithms
In our experiment we'll use slightly modified application introduced in part 9. You can the find application code here. There are basically 2 Lambda functions which both respond to the API Gateway requests and retrieve product by id received from the API Gateway from DynamoDB. One Lambda function GetProductByIdWithPureJava21LambdaWithGCAlg can be used with and without SnapStart and the second one GetProductByIdWithPureJava21LambdaAndPrimingWithGCAlg uses SnapStart and DynamoDB request invocation priming.
The results of the experiment below were based on reproducing more than 100 cold and approximately 100.000 warm starts with experiment which ran for approximately 1 hour. For it (and experiments from my previous article) I used the load test tool hey, but you can use whatever tool you want, like Serverless-artillery or Postman. We run experiments by giving Lambda functions 1024 MB memory and using JAVA_TOOL_OPTIONS: "-XX:+TieredCompilation -XX:TieredStopAtLevel=1" (Java client compilation without profiling) which has a very good trade off between cold and warm start times.
Unfortunately I couldn't make Lambda function start with The Z Garbage Collector (with both default and generational one) running into the error :
Failed to commit memory (Operation not permitted)
[error][gc] Forced to lower max Java heap size from 872M(100%) to 0M(0%)
[error][gc] Failed to allocate initial Java heap (512M)
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
It tried out bigger memory setting as 1024 like 2048 MB and even more MBs, but the same error still appeared.
Let's look into the results of our measurements with other 3 garbage collection algorithms.
Abbreviation c is for the cold start and w is for the warm start.
Cold (c) and warm (w) start time without SnapStart enabled in ms:
GC Algorithm | c p50 | c p75 | c p90 | c p99 | c p99.9 | c max | w p50 | w p75 | w p90 | w p99 | w p99.9 | w max |
---|---|---|---|---|---|---|---|---|---|---|---|---|
G1 | 3655.17 | 3725.25 | 3811.88 | 4019.25 | 4027.30 | 4027.83 | 5.46 | 6.10 | 7.10 | 16.79 | 48.06 | 1929.79 |
Parallel Collector | 3714.10 | 3789.09 | 3857.87 | 3959.44 | 4075.89 | 4078.25 | 5.55 | 6.20 | 7.10 | 15.38 | 130.13 | 2017.92 |
Shenandoah | 3963.40 | 4019.25 | 4096.30 | 4221.00 | 4388.78 | 4390.76 | 5.82 | 6.45 | 7.39 | 17.06 | 71.02 | 2159.21 |
Cold (c) and warm (w) start time with SnapStart enabled without Priming in ms:
GC Algorithm | c p50 | c p75 | c p90 | c p99 | c p99.9 | c max | w p50 | w p75 | w p90 | w p99 | w p99.9 | w max |
---|---|---|---|---|---|---|---|---|---|---|---|---|
G1 | 1867.27 | 1935.68 | 2152.02 | 2416.57 | 2426.25 | 2427.35 | 5.47 | 6.11 | 7.05 | 17.41 | 51.24 | 1522.04 |
Parallel Collector | 1990.62 | 2047.12 | 2202.07 | 2402.12 | 2418.99 | 2419.32 | 5.68 | 6.35 | 7.45 | 18.04 | 147.83 | 1577.21 |
Shenandoah | 2195.47 | 2301.07 | 2563.37 | 3004.89 | 3029.01 | 3030.36 | 5.73 | 6.41 | 7.51 | 17.97 | 75.00 | 1843.34 |
Cold (c) and warm (w) start time with SnapStart enabled and with DynamoDB invocation Priming in ms:
GC Algorithm | c p50 | c p75 | c p90 | c p99 | c p99.9 | c max | w p50 | w p75 | w p90 | w p99 | w p99.9 | w max |
---|---|---|---|---|---|---|---|---|---|---|---|---|
G1 | 833.50 | 875.34 | 1089.53 | 1205.26 | 1269.56 | 1269.8 | 5.46 | 6.10 | 7.16 | 16.39 | 46.19 | 499.13 |
Parallel Collector | 900.18 | 975.12 | 1058.41 | 1141.94 | 1253.17 | 1253.99 | 5.82 | 6.61 | 7.75 | 16.87 | 49.64 | 487.73 |
Shenandoah | 1065.84 | 1131.71 | 1331.96 | 1473.44 | 1553.59 | 1554.95 | 5.77 | 6.40 | 7.39 | 17.20 | 65.06 | 500.48 |
Conclusion
In this article we explored the impact of Java garbage collection algorithms (G1, Parallel Collector and Shenandoah) on the performance of the Lambda function with Java 21 runtime. We saw quite a bit of a difference between the performance of those algorithms. Using the default settings with G1 (default one) we experience ( sometimes by far) the lowest cold and warm start times. By using SnapStart with priming of the DynamoDB request the performance results are as expected much closer to each other.
Please refer to the documentation of each garbage collection algorithm to tune settings like mix and max memory which can provide significant improvement in performance and do your own measurements.
Posted on September 23, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.