Ranking AWS Lambda Runtimes with ChatGPT 4

davo

Davo Galavotti

Posted on April 1, 2023

Ranking AWS Lambda Runtimes with ChatGPT 4

In the ever-evolving world of cloud computing, AWS Lambda functions have become increasingly popular due to their serverless nature and ease of use. With numerous languages and runtime options available, it can be difficult to decide which one is the best fit for your project.

In this post, I'll show you how I used ChatGPT-4 to analyze the cold starts duration and avg. duration of different AWS Lambda runtimes based on data from the Lambda Cold Starts analysis project by Maxime David.

Through this exercise, I learned a ton about the relative performance and trade-offs between languages by comparing cold starts, initialization and average duration.

But before getting into it, a quick disclaimer: this is not advice for choosing your next runtime for your AWS Lambda functions, the insights provided by Maxime David's analysis make it easier for developers to make informed decisions.

A screenshot of the Lambda Cold Starts analysis UI

Methodology

The Lambda Cold Start tool generates data by deploying a "hello-world" function in various AWS-supported and custom runtimes. Functions are packaged in zip files, uploaded to an S3 bucket, and invoked as cold starts 10 times daily. The REPORT log line, containing init duration, max memory used, and other information, is saved to a DynamoDB table. This information is aggregated, and a new JSON file is created and committed to Maxime's GitHub repo.

A GitHub Pages-hosted static website fetches this JSON file, displaying results on a simple UI. The data is updated daily, ensuring accurate and up-to-date comparisons.

Comparing Runtimes using ChatGPT-4

I started by copying the data generated by Maxime's tool and pasted it in ChatGPT using Model: GPT-4. No prompts, just the plain JSON file.

ChatGPT automatically converted the JSON into perfectly formatted Markdown table, which quite was helpful to visualize the metadata and metrics.

You can customize how the table is presented.

Prompt


 Make a markdown table and sort the runtimes by Avg. Cold Start Duration (in seconds). Add averageMemoryUse. Use displayName as Runtime for the Table Header. Don't include runtime: value in the header.

Enter fullscreen mode Exit fullscreen mode
Runtime Avg. Cold Start Duration (sec) Avg. Memory Used (MB)
Rust (prov.al2) 0.016 14.1
Go (provided) 0.052 18.9
Go 1.x 0.089 29.4
Python 3.9 0.112 36.0
Python 3.7 0.115 35.4
Python 3.8 0.117 38.2
GraalVM Java17 (prov.al2) 0.121 33.5
Dotnet7 AOT (prov.al2) 0.120 34.0
Ruby 2.7 0.142 30.1
Node.js 12.x 0.154 55.0
Dotnetcore 3.1 0.156 51.0
Node.js 16.x 0.158 57.4
Node.js 14.x 0.177 56.5
Node.js 18.x 0.217 67.0
Dotnet 6 0.225 59.8
Java 11 Snapstart 0.228 70.4
Quarkus (prov.al2) 0.233 54.1
Java 11 0.471 74.5
Java 8 0.532 71.9

Casual data analysis using prompt engineering

I knew Rust was the faster runtime, thanks to Max's tool, but I wanted to know how much faster it was in simple terms.

Prompt


 much faster Rust is compared to Python 3.9, 3.8, 3.7 and Node 12, 14, 16, 18? Use NLP by comparing it as "X time faster".

Enter fullscreen mode Exit fullscreen mode

The analysis was extremely helpful because it automatically assumed "average durations" will be a useful metric to compare.

ChatGPT compared the average durations of 🦀 Rust, 🐍 Python 3.7, 3.8, 3.9, Node.js 12.x, 14.x, 16.x, 18.x, and Go. Based on this comparison, Rust emerged as the fastest runtime, followed closely by Go. Node.js and Python runtimes showcased varying degrees of performance, with some versions faring better than others.

ChatGPT 4 results for How much faster Rust is compared to Python and Node

Later I asked Add Go to the mix.

ChatGPT 4 results for How much faster Rust and Go, compared to Python and Node

Cold Starts, Best vs. Worst

Using NLP I was able to understand that Rust is 3.20x faster than Python 3.9 and 10.29x faster than Node.js 18.x, which is impressive, but I wanted to compare Cold Starts as well, after all that was the core idea of Maxime's tool. And also I wanted to learn how the best compares against the worst performers.

Prompt:


 compare Cold Starts now, how much better Rust is compared to Python 3.9, 3.8, 3.7 and Node 12, 14, 16, 18? Use NLP by comparing it as "cold start time is X times shorter".

Enter fullscreen mode Exit fullscreen mode

ChatGPT 4 results for How much faster Rust and Go, compared to Python and Node

Prompt:


 how much better Rust is compared with the worst performers?

Enter fullscreen mode Exit fullscreen mode

ChatGPT 4 results for how much better Rust is compared with the worst performers

Emojis makes everything better

To visualize the performance of each runtime in a more evocative way, I asked ChatGPT to compare the runtimes using emojis to represent their relative speeds:

ChatGPT 4 results for comparing the runtimes using emojis

So, Rust is 🚀 (Rocket)
Go: 🏎️ (Race Car)
Python 3.9: 🚗 (Car)
Python 3.8: 🛵 (Scooter)
Python 3.7: 🐢 (Turtle)
Node.js 12.x: 🚂 (Train)
Node.js 14.x: 🚆 (Light Rail)
Node.js 16.x: 🛴 (Kick Scooter)
Node.js 18.x: 🚲 (Bicycle)

It's a fun and creative way to convert this light analysis and share it on Twitter.

Hitting the token limit

It's all fun and games, until you start noticing an issue.

Since I was casually doing data analysis, hitting the ChatGPT 4 token limit (32K) was expected.

What happens when you hit the token limit? Well, GPT will do what it does best, which is filling the gaps by predicting the next best value. So, even doing a light data analysis, this is a problem, because ChatGPT will respond to a prompt about sorting the table by any other criteria, but the data it will use will not be right. It was not even skewing the data, it was straight-up wrong.

Prompt result came back with false statement

So, if I wanted to keep doing "data analysis", I'll need to reset context, because ChatGPT also uses the chat history as part of that token buffer. The fix? Just paste the JSON file once again, and start over.

Prompt:


 Make a markdown table and sort the runtimes by Avg. Cold Start Duration (in seconds). Add averageMemoryUse. Use displayName as Runtime for the Table Header. Don't include [runtime: value] in the header.

Enter fullscreen mode Exit fullscreen mode

Prompt result Make a markdown table and sort the runtimes by Avg. Cold Start Duration (in seconds). Add averageMemoryUse. Use displayName as Runtime for the Table Header. Don't include [runtime: value] in the header.

Takeaways

The Lambda Cold Starts analysis by Maxime David provides really valuable insights into the performance of various AWS Lambda runtimes, making it easier for developers to choose the right runtime for their projects.

I used ChatGPT and it proved to be a useful tool in processing the JSON payload data and presenting the data in an accessible and engaging format, and in the process I learned a bunch of things I was not aware of.

Again, it's extremely important for me to clarify that this is not advice for choosing your next runtime for your AWS Lambda functions and serverless workflows. You should consider the relative performance and trade-offs between languages, and a gauntlet of other variables.
Regardless, it was a simple and fun exercise to weigh the benefits of faster runtimes like Rust and Go against the availability of libraries for languages like Python and Node.js.

If you're interested in learning more about Serverless from the perspective of a product designer & frontend developer, follow me on Twitter, @pixelbeat, as I'm working my way into Cloud Development.

Source Data: Lambda Cold Starts analysis by Maxime David.
Source Code: Lambda Cold Starts Github Repo by Maxime David.

💖 💪 🙅 🚩
davo
Davo Galavotti

Posted on April 1, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related