Deploy a ML model using Google Cloud Run, Github Actions and Terraform
Alvaro Valarezo de la Fuente
Posted on October 25, 2022
In this post. I will explain how to expose an API in Python using FastAPI from a trained model, use best CI/CD practices (Github Actions) and IaC (Terraform) to automate infrastructure creation.
Prerrequisites
- Docker Desktop
- Git
- Github Account
- Google Cloud Platform with owner permissions
- Clone this repo
Google Cloud Run
Cloud Run is a serverless platform from Google Cloud to deploy and run containers. Cloud Run can be used to serve Restful web APIs, WebSocket applications, or microservices connected by gRPC.
In this project we will need:
- An IAM account with permissions to create a service account
- Cloud Storage Admin permissions
- Cloud Registry Admin permissions
- Google Cloud Run Admin permissions
In case you don't want to expose the API for public access:
In the terraform/main.tf:
Remove the resource "google_cloud_run_service_iam_member" "run_all_users"
.
Ideally, you can set the IAM accounts that can access this api using Google Cloud Run UI or using Terraform. This approach doesn't add any latency to the customer because it uses built-in IAM roles and permissions from Google Cloud.
Terraform
Terraform is a popular open-source tool for running infrastructure as code. It uses HCL which is a declarative language to declare infrastructure.
The basic flow is:
- Terraform init: Initializes the plugins, backend and many config files Terraform uses to keep tracking of the infrastructure.
- Terraform plan: Generates an execution plan for all the infrastructure which is in terraform/main.tf
- Terraform apply: Apply all the changes that were on the plan.
All steps are declared in the .github/workflows/workflow.yaml
Machine Learning Model
It's a logistic regression model that is serialized as a pickle file. It takes as an input a list with 37 parameters and returns a number between 0-1 which determines the probability of a delayed flight.
Eg:
{
"test_array": [
0,0,0,0,0,1,0,1,0,0,1,0,1,0,1,0,1,0,1,0,0,1,0,0,1,0,1,0,0,1,1,1,0,1,0,1,0]
}
Returns: 0
In this case it means the flight isn't going to be delayed.
Run it locally
- Fork the repo
- Clone it in your computer
- Run
docker build -t ml-api .
in the root of the project to build the image of the api. - Run
docker run -d --name ml -p 80:8080 ml-api
to create the container using ml-api image built. - Open localhost to test the project.
- On /predict/ post endpoint, you can use this body as an example:
{
"test_array": [
0,0,0,0,0,1,0,1,0,0,1,0,1,0,1,0,1,0,1,0,0,1,0,0,1,0,1,0,0,1,1,1,0,1,0,1,0]
}
- You should expect a response 200 with a
"prediction": 0
which means the flight wasn't delayed.
How to deploy it with your GCP account
- Generate a Service Account key and upload it in Github Secrets as
GCLOUD_SERVICE_KEY
- Push any change in the main branch
- That's it! :)
A bit of stress testing for the API
- On Mac
brew install wrk
- Run
wrk -t12 -c200 -d45s -s request.lua https://mlops-api-backend-1-5gdi5qltoq-uc.a.run.app/predict/
to open 12 threads with 200 open http connections during 45 seconds.
How can we improve the results
The best approach would be using horizontal scaling. in this case we can create a 2nd Google Cloud Run instance and use load balancing to distribute the traffic between both instances.
Posted on October 25, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.