Deploy Azure DevOps Self-Hosted Build Agents on Kubernetes (AKS) and scale them using KEDA

lakkimartin

Martin

Posted on June 22, 2022

Deploy Azure DevOps Self-Hosted Build Agents on Kubernetes (AKS) and scale them using KEDA

Overview

KEDA is an event driven autoscaler for Kubernetes that allows you to scale containers based on events.

It is a lightweight single purpose component that can be added to any Kubernetes cluster. Keda can also work alongside the Kubernetes horizontal autoscaler.

The diagram below (taken from the Keda docs) shows how Keda integrates with the horizontal pod autoscalers, external events and the Kubernetes API:

Image description

More information on Keda can be found in the official docs

In this tutorial will be going over how you can use deploy self-hosted build agents on to an Azure Kubernetes Cluster and scale them using KEDA (scaled jobs) based on the number of jobs in a build queue.

As a side note, Azure Container Apps also supports KEDA and scaling using the pipelines trigger however KEDA kills containers half through a job and scaling jobs are not yet supported in Container Apps.

All of the code for this project can be found on my Github page here

Tools Required

We will be using the following tools so make sure you have them installed on your local machine.

  • Helm (Kubernetes Package Manager) - Install Guide
  • Azure CLI - Install Guide
  • Kubernetes Command Line - Install Guide
  • Terraform (Infrastructure as Code) - Install Guide
  • Docker Desktop - Install Guide

Deploy Azure resourcse

Before we get started we need to deploy the Azure components that will host our solution. These are the following:

  • Azure Resource Group
  • AKS Cluster
  • Container Registry

The Kubernetes cluster will be basic with a single pool and single virtual machine. No advanced networking will be used for the purposes of proving the concept.

We will be deploying this through Terraform which is an infrastructure as code deployment tool. The Terraform deployment file can be found in my Git repository here.

Step 1: Fork the repository and create a local clone on your machine

Step 2: Navigate to the folder that contains the main.tf file and run:



Terraform Init 


Enter fullscreen mode Exit fullscreen mode

Image description

This will initialise Terraform and create the local state file.

Step 3: Now we are going to run Terraform plan which will list the resources that will be deployed. It should be 4 in total as listed above (1 resource is a role assignment to link AKS to ACR).



Terraform Plan 


Enter fullscreen mode Exit fullscreen mode

Image description

Step 4: Once we are happy with what Terraform is going to deploy we can run the apply stage which will deploy the resources into Azure.



Terraform Apply 


Enter fullscreen mode Exit fullscreen mode

Image description

You should now see the resources in the Azure portal:

Image description

Build the docker images

Next we need to build the Docker image for the Azure DevOps self-hosted agents. Microsoft have documented this quite for Docker here. I have modified the image slightly to include PowerShell and we are using run.sh rather than docker in the start.sh script as ContainerD is the new container runtime in AKS version 1.19 and higher. You can re-use my images for these next steps from my Github repo.

Step 1: Start up docker Desktop

Step 2: Navigate to the repository you cloned earlier and navigate to the folder that contains the docker image.

Step 3: Let's build the image and tag it:



Docker build -f <docker-image-path> -t <tag> 


Enter fullscreen mode Exit fullscreen mode

You can test the image by running the container locally which will register it in DevOps. Just supply the environment variables.



docker run -e AZP_URL= -e AZP_TOKEN= -e AZP_AGENT_NAME= -e AZP_POOL= <image>


Enter fullscreen mode Exit fullscreen mode

Step 4: Let's login to the Azure Container Registry.



Docker login <login-server> -u <username> -p <password>


Enter fullscreen mode Exit fullscreen mode

You can get the above parameters from the container registry in the portal:

Image description

Step 5: Let's push the image to the Azure Container Registry



Docker push <imagename:tag>


Enter fullscreen mode Exit fullscreen mode

The image should now be in the container registry which allow the containers running the agents to pull it.

Install KEDA on to the AKS cluster

KEDA runs in a container on the Kubernetes cluster and it's not built in so we need to install it. The KEDA pod handled all of the event driven scaling. We are going to use Helm to the install however you can apply the manifests directly.

Make sure you are authenticated to thee AKS cluster before running these next steps.

Step 1: Add the helm repo



helm repo add kedacore https://kedacore.github.io/charts


Enter fullscreen mode Exit fullscreen mode

Step 2: Update the helm repo



helm repo update


Enter fullscreen mode Exit fullscreen mode

Step 3: Create a new namespace and install the KEDA helm chart



kubectl create namespace keda
helm install keda kedacore/keda --namespace keda


Enter fullscreen mode Exit fullscreen mode

Image description

You should now see the KEDA pods running in the keda namespace:

Image description

We are now ready to start using KEDA for scaling our containers.

Deploying the agents

Set up a new Agent Pool in Azure DevOps

Before we apply the Kubernetes manifests we need to set up a new agent pool in DevOps.

Image description

Step 1: Set up a self hosted pool on the organisation level. Take a note of the pool id. This can be found in the URL when you select the pool:

https://dev.azure.com/organisation/_settings/agentpools?poolId=16&view=jobs

In this case it's 16.

Step 2: You will also need to generate a PAT token with Agent Pools read & manage permissions.

Step 3: You will need to encode the token to Base64. You can do this either through Bash or through this website.

Apply the manifests

Because we are going to be using scaling jobs, we cannot specify idle agents. This creates a problem as you can't queue an Azure pipelines job on an empty agent pool so a workaround is to deploy a static agent and turn it offline.

Apply the deployment.yaml from the cloned repository by running:



kubectl apply -f ./deployment.yaml


Enter fullscreen mode Exit fullscreen mode

Make sure you replace the variables in the YAML for the agent pool name, organisation URL, image and PAT token. See examples below:

  • image: = <acr-name>.azurecr.io/<repository-name>:<tag>
  • AZP_URL = https://dev.azure.com/<organisation-name>
  • AZP_TOKEN = Base64 encoded token that we created earlier.
  • AZP_POOL = Name of the self-hosted agent pool in Azure DevOps

Once you have applied the manifest you should see the pod running in the default namespace:

Image description

We can now see the agent running in DevOps:

Image description

Disable the agent and leave it running.

Apply the KEDA Scale Job Manifest

The final step is to apply the scaled job object through our keda-scaled-jobs.yaml manifest. You will need to replace the values for the image, organisation, pool, token, and pool ID.



kubectl apply -f ./keda-scaled-jobs.yaml


Enter fullscreen mode Exit fullscreen mode

Image description

Now that we have applied the scale object, KEDA will be listening to the build queue every 10 seconds (can be customised). I have included a load testing pipeline for Azure DevOps that triggers 10 jobs which run some PowerShell loops. Let's run some jobs and see the scaling in action:

Pods are spinning up:

Image description

Agents are coming online:

Image description

We have now configured our self-hosted agents to run in Docker using Kubernetes as the orchestrator and KEDA as the scaling engine!

You can customise KEDA on how often it monitors the job queue, maximum number of replicas and more.

Limitations

I have discovered some limitations with Keda and scaling jobs which I'm working on resolving:

  • If you run some jobs in DevOps and cancel them the containers keep running.
  • KEDA does not remove any pods in a "completed" state. This can be resolved with a custom clean up shell script running as a cron on the cluster however it may be possible to do it with KEDA.
  • Offline agents are not removed from the DevOps pool. I have a cron job running in Azure DevOps that cleans this up.

Next Steps

All of the above steps can of course be automated through a simple build pipeline either through Github actions or Azure DevOps pipelines.

I will be building all of this functionality into a Helm chart that will also support Github runners so keep an eye out!

💖 💪 🙅 🚩
lakkimartin
Martin

Posted on June 22, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related