Serverless Design Principles
Kevin Wu
Posted on March 25, 2022
Introduction
Serverless is becoming more and more popular these days. It has always
been an interesting space for me because I\'m interested in the
stateless functional style of programming. I started working on Lambda
functions essentially as my first project as an SDE at Amazon back in
2017, worked with one of our purely serverless data storage services at
Amazon Fashion, eventually made my way to the actual Lambda backend
team, and now at Microsoft I find myself working with Azure Functions
again. Obviously, Lambda has not really existed for that long, so I
think I\'ve basically maxed out on possible number of years of
Serverless experience. I hope to highlight some of the more interesting
differences between more \"classic\" design with servers and serverless
design.
Motivations
I wanted to start with why we would use Lambda/Functions over classic
VMs or Kubernetes clusters. The original motivation for Lambda was
mostly to save costs, but later on we noticed some efficiencies that
could only be realized at Lambda scale.
Saving costs
There are a lot of web services out there that generally serve less than
1 transaction per second, but were still costing a lot in VM usage. The
idea was that there should be an efficient way for us to schedule a lot
of these web services on the same hardware so that we can save a lot of
money.
Efficiencies at scale, minimizing resource contention with data
After running Lambda for a while, Lambda realized there were ways that
we could schedule work really efficiently. Lambda has the data to
analyze different workflows to see what their bottleneck resource is,
and schedule them such that the functions would not have to contend for
the same resource at the same time. For example, it would be optimal to
schedule a memory-heavy workflow with a compute-heavy workflow, so they
are much less likely to contend for the same resources. There is an
excellent talk by Marc Brooker about this you can find here
https://youtu.be/xmacMfbrG28?t=1310.
It\'s just easier
I\'d be remiss if I didn\'t include this, but a lot of the time, Lambda
and its ilk are just the easiest services to set up, requiring little
knowledge of server infrastructure, and that makes it much easier to use
to a broad audience. Last year, I threw together a demo of a bookstore
to give a talk on design for the CS department at my alma mater, and it
was just easier to use API Gateway backed by Lambda and DynamoDB, so I
didn\'t have to really think about servers at all.
Key ideas
These are some key ideas that will come up repeatedly in our best
practices. I\'m highlighting these specifically because they differ from
traditional \"serverful\" architecture.
Concurrency
We use the number of concurrent invocations to talk about the scale of a
Function, not requests per second or capacity, which is more
traditional. You can calculate your concurrency by multiplying your
requests per second by the expected latency of the request, see
Little\'s Law. There is
actually quite a lot of content in Marc\'s talk that I linked to earlier
about this if you want a deeper dive on why this is the case. The main
reason is that concurrency is a measure that doesn\'t depend on hardware
(hence serverless right?). Concurrency also takes into account how
efficiently you\'re responding to requests rather than just how many
requests you\'re getting. One common ticket I\'d see at Lambda would be
a team wondering why they were being throttled when they had low
requests per second (more on this later).
Cold starts
The big issue with serverless that people like to talk about is higher
latency from cold starts. A cold start1 is essentially when your
execution environment needs to both prepare for the execution, and then
actually perform the execution. At Lambda, we called these stages Init
and Invoke. It\'s not uncommon to see cold starts that are over 10
seconds, especially if you\'re not careful. I\'ve also seen many tickets
about this in my time at Lambda.
Quick detour on code/data reuse.
I also wanted to include a quick interlude about how persistent
resources get reused because I think it\'s practical for those new at
writing these Functions.
Class members
We talked a little bit about init and invoke. Intuitively, you\'d
think well I\'m creating some resources, let\'s say a DB client, during
the init phase, but how does it get reused? Do I instantiate a new
client per request, or can we reuse resources across invokes? The
answer, thankfully, is that you do in fact reuse these resources
across invokes. If you initialize your DB client outside of your
function definition, it will be reused the next time you invoke, so
members of your class probably behave as you would expect coming from a
more serverful background. Let me illustrate this with a quick example.
class Function:
db_client = None
def handle_invoke(event, context):
if db_client is None:
db_client = DbClient()
return db_client.get_some_stuff()
On your first invoke, you will instantiate a new DbClient
{.verbatim},
but on subsequent requests to the same execution environment,
db_client
{.verbatim} will continue to already be instantiated, so you
will be able to get_some_stuff
{.verbatim} directly.
The more perceptive readers may notice a potential problem here. If
you\'re following security best practices using ephemeral credentials
that expire after some hours, you could experience credentials issues by
reusing these clients. If you\'re using the AWS SDK, this will be
generally handled for you, but otherwise it\'s something to keep in mind
as you\'re developing.
File System
This is a quick one. If you use EFS with Lambda or by default with Azure
Functions, your invokes all share the same file system. This was
originally meant for machine learning workflows, where people have
relatively large training models they want to load, but has plenty of
other uses.
Best practices
Here are a collection of best practices that I found to be less commonly
reported, but are really helpful with design.
Keep functions short (like less than 2 minutes)
If you read the Azure Functions best practices, they say that you should
keep your functions short because they timeout. This is true, of course.
Lambda used to time out at 5 minutes, but now has extended to 15
minutes. So you might think, well, if I limit my function to 10 minutes,
I should be pretty safe. Unfortunately, there is more nuance here.
Earlier, I introduced the concept of concurrency and Little\'s Law.
::: center
L=\lambdaW
{=latex}
:::
L
{.verbatim} is the concurrency. λ is the effective arrival rate
(requests per second), and W
{.verbatim} is the wait time (latency).
For example, the largest throttling point for Lambda in one of the
regions is 3000 concurrent invokes. If you consider a 10 minute function
(600 seconds), we can calculate the λ, or requests per second at which
you\'ll be throttled.
::: center
3000=λ * 600
λ = 3000/600
λ = 5
:::
Oops, just 5?
From the equation itself, you can see that the higher the latency, the
lower the requests per second your function will be able to handle. Even
using the default numbers, the λ decreases unintuitively quickly as W
increases.
Generally speaking, Lambda is designed to execute short functions, and
other tools such as AWS Batch or ECS are better suited for longer
running jobs.
Cold start related
Let\'s dig a little deeper on the steps to init. Generally speaking,
no matter the serverless environment, we have to do these things.
- Acquire an execution sandbox
- Pull the code/executable into the sandbox
- Start the executable runtime e.g. JVM or CLR
- Run your init code.
Keep functions small (like under 50MB)
We\'ve also already talked about cold starts. One of the most
unintuitively slow parts of a cold start is actually pulling your
code/binary. I suggested the limit of 50MB or so mainly because lots of
people insist on using Java or C# to write functions, but in reality if
you\'re using an interpreted runtime like python or node, you can easily
keep your code under 1MB. I mean in this case Java and C# will be like
50x slower, and that\'s just to download the executable.
Use an interpreted runtime for more predictable results
While we\'re on the topic of runtimes, try to use a fast runtime. JVM
and CLR have a reputation of taking a long time to initialize. I would
routinely see such functions take upwards of 10 seconds to initialize.
While JVM and CLR languages generally execute faster than node or
python, (if you\'re following the earlier advice about function
duration), you are spending a much higher proportion of your time in
init causing more latency instability when you do hit a cold start. In
my experience, node has been a good choice for having a more consistent
experience. You can also bypass the runtime completely and pick a
compiled language like Golang or Rust2. I\'ve also been on teams that
used Golang to great effect with Lambda as well.
Be mindful of price
It\'s easy to get lost in the ease of using Functions and forget how
much you\'re spending. I\'ve accidentally spent thousands of dollars (of
Amazon\'s money) in just a few days. The pricing model is a constant
cost per invoke and then a rate on GBs, where GB is how much memory
you\'ve allocated (not how much you\'re using) and seconds of invocation
rounded to the nearest millisecond. This means the less you use, the
less you pay, but also conversely, the more you use, the more you pay.
It\'d be good to look up how much it would cost to get a VM or
Kubernetes to do the same job and make sure you\'re willing to pay the
excess. I\'ve found that the point where Lambda starts costing more
happens at a much lower concurrency than people generally think. Of
course, functions do more than VMs. They scale automagically, do OS
patching, etc, so it may be worth it to you, but you should at least
know how much you\'re paying for that.
Assorted Tips
Lambda currently doesn\'t charge for inits under 10s
Lambda doesn\'t really like people spreading this particular tidbit, but
this is relatively widely known now. In order to optimize cold start
times, your init phase is generally run on a more powerful sandbox
and you don\'t have to pay for it. The caveat is that if you spend
more than 10 seconds in the init phase, the sandbox is restarted and
you will be charged for init. There is a blog post that went
relatively viral about this phenomenon:
https://hichaelmart.medium.com/shave-99-93-off-your-lambda-bill-with-this-one-weird-trick-33c0acebb2ea.
Init can happen before you actually make an invoke request
This one surprises customers from time to time, but in order to optimize
cold starts, the platform can initialize your execution environment well
before you actually make an invoke. Especially if you make use of the
dependency injection features. Reworking the previous example a little
bit to show a common way that this comes up:
class Function:
def __init__():
self.logger = SomeCustomLogger()
self.db_client = DbClient()
logger.logInfo("db_client is initialized")
def handle_invoke(event, context):
return db_client.get_some_stuff()
If we log the initialization of db_client
{.verbatim}, you can see that
log statement, even if you did not make a request.
Don\'t use Task without returning something in Azure Functions (when you can)
This is more off a quirk of how async/await works in dotnet, but plain
Task
{.verbatim} functions are syntactic sugar for void
{.verbatim}.
But since execution metadata is saved in the Task
{.verbatim} object,
if you use just plain Task
{.verbatim}, you return void
{.verbatim}
and you lose all your execution metadata. For example, if you return
Task
{.verbatim}, all Exceptions are suppressed because there is
no way to return them to the caller. For that reason, I suggest at least
returning something like Task<bool>
{.verbatim}, which will throw
Exceptions as expected.
-
Actually, within Lambda we had several levels of cold starts,
which I get into a little bit later, but we\'re keeping it simple
for now. ↩ -
Shoutout to https://github.com/awslabs/aws-lambda-rust-runtime.
But probably you\'ll have issues with Rust\'s really large compiled
binaries. ↩
Posted on March 25, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 29, 2024