How to build a tiny Flask Docker image for Cloud Run

Hey I know what Cloud Run is, just get to the point!

I'd like to know what Cloud Run is

Cloud Run is a fully managed product within the Google Cloud (GCP) suite which allows you to deploy containerized serverless applications.

Is it like AWS Lambda?

Well... sort of. AWS Lambda is a fully managed platform for deploying serverless applications. However with AWS Lambda you only need to write the code. The GCP equivalent to Lambda is Cloud Functions. The AWS equivalent to Cloud Run would be Fargate.

Why don't I just use AWS Fargate?

The standard way to run a Fargate application is to have at least 1 instance running. As traffic increases the Auto Scaling Group will increase the number of instances to handle the load. If you have an application with lots of traffic and can afford to have an EC2 instance running 24/7 then sure go for it!

If you'd like to deploy a containerized application with the ability to scale to zero (and so only pay for what you sue) then read on my friend because Cloud Run is for you.

Erm OK, so why don't I just use AWS Lambda

Hey no arguments here. But there are some cool benefits.

Concurrency

Where Lambda will spin up a new instance per request, a single Cloud Run instance can service more than request. When the traffic is too much for that instance a second instance is spun up and so on. In the simple example of 2 users sending requests at almost the same time, with Lambda both users experience cold start (waiting for the instances to load up) whereas in Cloud Run only one container will initialize and so the first user still has to wait for the instance to boot up, but the second user enjoys the already warm instance.

Developer Freedom

Being a container you can write it in any language with any libraries or even your own binaries.

Lambda does allow custom run-times but if that's not enough then a Docker container is what you need.

Note During reInvent 2020 AWS announced the ability to deploy your own containers to Lambda. So I guess it's just concurrency?

That was confusing, what is Cloud Run?

It's a platform that allows you to deploy serverless containerized applications where you only get billed while your application is responding to requests.

Dockerfile

Here is an example Dockerfile to build your Flask application.

Those with a keen eye will note this is a multi stage build. This allows us to install everything required to build the app in one container and then copy over the built application into another container and then only deploy precisely what is required to run the app which gives huge savings on size.

The next thing to note is that the second stage uses a Distroless python image as the base image. This is a super lean image which contains the barest of bare essentials to run our Flask application.

Application Code

Ok, so let's look at a sample app. Here is our app.py, a simple Flask app.

Wait a minute, the Dockerfile makes no mention of an app.py but rather a wsgi.py.

Waitress

If you've used Flask before hopefully you know not to use the built-in development server that comes built in. Why not you say? Because in plain terms, it only services one request at a time, killing our whole selling point of concurrency.

Enter Waitress: a production quality WSGI server. The benefit of waitress over other WSGI servers (eg NGINX) is that Waitress is written purely in python so we don't need to install any additional libraries onto our image. This means we can keep it nice and small (Yay!) and serve requests concurrently (Woo!)

Here's an example wsgi.py file

How to deploy?

Google have plenty of documentation on this so I'm not going rehash all of that.

Why bother do all this?

The python slim Docker image comes to around 60MB compressed, but with Distroless you can get this down to 20MB. That might seem a bit "who cares" but let's look at it from another angle.

On the front end developers have been scrimping and saving on file sizes for years with uglifying and compression so that pages load faster for the end user. It's all about that Time To First Paint. Here every KB counts and all sorts of DNS pre-fetching and pre-loading goes on to get the content to the user as quickly as possible.

Serverless applications are still a relatively new paradigm. Classically (and still currently) a website will be serviced by a dedicated server, meaning that boot times aren't an issue because the server ideally boots once and then runs forever.

However in the serverless world, when a request comes in a new service is provisioned and the application image is downloaded onto it and then booted up. So now size matters, and our Distroless app should boot up 3 times faster than the slim app! Then obviously the next area to look at is application start up time, but that's beyond the scope of this article.

Summary

Multi stage builds, Distroless and Waitress are your friends here.

Blog