Ranking AWS Lambda Runtimes with ChatGPT 4

In the ever-evolving world of cloud computing, AWS Lambda functions have become increasingly popular due to their serverless nature and ease of use. With numerous languages and runtime options available, it can be difficult to decide which one is the best fit for your project.

In this post, I'll show you how I used ChatGPT-4 to analyze the cold starts duration and avg. duration of different AWS Lambda runtimes based on data from the Lambda Cold Starts analysis project by Maxime David.

Through this exercise, I learned a ton about the relative performance and trade-offs between languages by comparing cold starts, initialization and average duration.

But before getting into it, a quick disclaimer: this is not advice for choosing your next runtime for your AWS Lambda functions, the insights provided by Maxime David's analysis make it easier for developers to make informed decisions.

Methodology

The Lambda Cold Start tool generates data by deploying a "hello-world" function in various AWS-supported and custom runtimes. Functions are packaged in zip files, uploaded to an S3 bucket, and invoked as cold starts 10 times daily. The REPORT log line, containing init duration, max memory used, and other information, is saved to a DynamoDB table. This information is aggregated, and a new JSON file is created and committed to Maxime's GitHub repo.

A GitHub Pages-hosted static website fetches this JSON file, displaying results on a simple UI. The data is updated daily, ensuring accurate and up-to-date comparisons.

Comparing Runtimes using ChatGPT-4

I started by copying the data generated by Maxime's tool and pasted it in ChatGPT using Model: GPT-4. No prompts, just the plain JSON file.

ChatGPT automatically converted the JSON into perfectly formatted Markdown table, which quite was helpful to visualize the metadata and metrics.

You can customize how the table is presented.

Prompt


 Make a markdown table and sort the runtimes by Avg. Cold Start Duration (in seconds). Add averageMemoryUse. Use displayName as Runtime for the Table Header. Don't include runtime: value in the header.

Runtime	Avg. Cold Start Duration (sec)	Avg. Memory Used (MB)
Rust (prov.al2)	0.016	14.1
Go (provided)	0.052	18.9
Go 1.x	0.089	29.4
Python 3.9	0.112	36.0
Python 3.7	0.115	35.4
Python 3.8	0.117	38.2
GraalVM Java17 (prov.al2)	0.121	33.5
Dotnet7 AOT (prov.al2)	0.120	34.0
Ruby 2.7	0.142	30.1
Node.js 12.x	0.154	55.0
Dotnetcore 3.1	0.156	51.0
Node.js 16.x	0.158	57.4
Node.js 14.x	0.177	56.5
Node.js 18.x	0.217	67.0
Dotnet 6	0.225	59.8
Java 11 Snapstart	0.228	70.4
Quarkus (prov.al2)	0.233	54.1
Java 11	0.471	74.5
Java 8	0.532	71.9

Casual data analysis using prompt engineering

I knew Rust was the faster runtime, thanks to Max's tool, but I wanted to know how much faster it was in simple terms.

Prompt


 much faster Rust is compared to Python 3.9, 3.8, 3.7 and Node 12, 14, 16, 18? Use NLP by comparing it as "X time faster".

The analysis was extremely helpful because it automatically assumed "average durations" will be a useful metric to compare.

ChatGPT compared the average durations of 🦀 Rust, 🐍 Python 3.7, 3.8, 3.9, Node.js 12.x, 14.x, 16.x, 18.x, and Go. Based on this comparison, Rust emerged as the fastest runtime, followed closely by Go. Node.js and Python runtimes showcased varying degrees of performance, with some versions faring better than others.

Later I asked Add Go to the mix.

Cold Starts, Best vs. Worst

Using NLP I was able to understand that Rust is 3.20x faster than Python 3.9 and 10.29x faster than Node.js 18.x, which is impressive, but I wanted to compare Cold Starts as well, after all that was the core idea of Maxime's tool. And also I wanted to learn how the best compares against the worst performers.

Prompt:


 compare Cold Starts now, how much better Rust is compared to Python 3.9, 3.8, 3.7 and Node 12, 14, 16, 18? Use NLP by comparing it as "cold start time is X times shorter".

Prompt:


 how much better Rust is compared with the worst performers?

Emojis makes everything better

To visualize the performance of each runtime in a more evocative way, I asked ChatGPT to compare the runtimes using emojis to represent their relative speeds:

So, Rust is 🚀 (Rocket)
Go: 🏎️ (Race Car)
Python 3.9: 🚗 (Car)
Python 3.8: 🛵 (Scooter)
Python 3.7: 🐢 (Turtle)
Node.js 12.x: 🚂 (Train)
Node.js 14.x: 🚆 (Light Rail)
Node.js 16.x: 🛴 (Kick Scooter)
Node.js 18.x: 🚲 (Bicycle)

It's a fun and creative way to convert this light analysis and share it on Twitter.

Hitting the token limit

It's all fun and games, until you start noticing an issue.

Since I was casually doing data analysis, hitting the ChatGPT 4 token limit (32K) was expected.

What happens when you hit the token limit? Well, GPT will do what it does best, which is filling the gaps by predicting the next best value. So, even doing a light data analysis, this is a problem, because ChatGPT will respond to a prompt about sorting the table by any other criteria, but the data it will use will not be right. It was not even skewing the data, it was straight-up wrong.

So, if I wanted to keep doing "data analysis", I'll need to reset context, because ChatGPT also uses the chat history as part of that token buffer. The fix? Just paste the JSON file once again, and start over.

Prompt:


 Make a markdown table and sort the runtimes by Avg. Cold Start Duration (in seconds). Add averageMemoryUse. Use displayName as Runtime for the Table Header. Don't include [runtime: value] in the header.

Takeaways

The Lambda Cold Starts analysis by Maxime David provides really valuable insights into the performance of various AWS Lambda runtimes, making it easier for developers to choose the right runtime for their projects.

I used ChatGPT and it proved to be a useful tool in processing the JSON payload data and presenting the data in an accessible and engaging format, and in the process I learned a bunch of things I was not aware of.

Again, it's extremely important for me to clarify that this is not advice for choosing your next runtime for your AWS Lambda functions and serverless workflows. You should consider the relative performance and trade-offs between languages, and a gauntlet of other variables.
Regardless, it was a simple and fun exercise to weigh the benefits of faster runtimes like Rust and Go against the availability of libraries for languages like Python and Node.js.

If you're interested in learning more about Serverless from the perspective of a product designer & frontend developer, follow me on Twitter, @pixelbeat, as I'm working my way into Cloud Development.

Source Data: Lambda Cold Starts analysis by Maxime David.
Source Code: Lambda Cold Starts Github Repo by Maxime David.

Blog

Ranking AWS Lambda Runtimes with ChatGPT 4

Davo Galavotti

Methodology

Comparing Runtimes using ChatGPT-4

Casual data analysis using prompt engineering

Cold Starts, Best vs. Worst

Emojis makes everything better

Hitting the token limit

Takeaways

Join Our Newsletter. No Spam, Only the good stuff.

Related