Deploy Azure DevOps Self-Hosted Build Agents on Kubernetes (AKS) and scale them using KEDA
Martin
Posted on June 22, 2022
Overview
KEDA is an event driven autoscaler for Kubernetes that allows you to scale containers based on events.
It is a lightweight single purpose component that can be added to any Kubernetes cluster. Keda can also work alongside the Kubernetes horizontal autoscaler.
The diagram below (taken from the Keda docs) shows how Keda integrates with the horizontal pod autoscalers, external events and the Kubernetes API:
More information on Keda can be found in the official docs
In this tutorial will be going over how you can use deploy self-hosted build agents on to an Azure Kubernetes Cluster and scale them using KEDA (scaled jobs) based on the number of jobs in a build queue.
As a side note, Azure Container Apps also supports KEDA and scaling using the pipelines trigger however KEDA kills containers half through a job and scaling jobs are not yet supported in Container Apps.
All of the code for this project can be found on my Github page here
Tools Required
We will be using the following tools so make sure you have them installed on your local machine.
- Helm (Kubernetes Package Manager) - Install Guide
- Azure CLI - Install Guide
- Kubernetes Command Line - Install Guide
- Terraform (Infrastructure as Code) - Install Guide
- Docker Desktop - Install Guide
Deploy Azure resourcse
Before we get started we need to deploy the Azure components that will host our solution. These are the following:
- Azure Resource Group
- AKS Cluster
- Container Registry
The Kubernetes cluster will be basic with a single pool and single virtual machine. No advanced networking will be used for the purposes of proving the concept.
We will be deploying this through Terraform which is an infrastructure as code deployment tool. The Terraform deployment file can be found in my Git repository here.
Step 1: Fork the repository and create a local clone on your machine
Step 2: Navigate to the folder that contains the main.tf
file and run:
Terraform Init
This will initialise Terraform and create the local state file.
Step 3: Now we are going to run Terraform plan which will list the resources that will be deployed. It should be 4 in total as listed above (1 resource is a role assignment to link AKS to ACR).
Terraform Plan
Step 4: Once we are happy with what Terraform is going to deploy we can run the apply stage which will deploy the resources into Azure.
Terraform Apply
You should now see the resources in the Azure portal:
Build the docker images
Next we need to build the Docker image for the Azure DevOps self-hosted agents. Microsoft have documented this quite for Docker here. I have modified the image slightly to include PowerShell and we are using run.sh rather than docker in the start.sh script as ContainerD is the new container runtime in AKS version 1.19 and higher. You can re-use my images for these next steps from my Github repo.
Step 1: Start up docker Desktop
Step 2: Navigate to the repository you cloned earlier and navigate to the folder that contains the docker image.
Step 3: Let's build the image and tag it:
Docker build -f <docker-image-path> -t <tag>
You can test the image by running the container locally which will register it in DevOps. Just supply the environment variables.
docker run -e AZP_URL= -e AZP_TOKEN= -e AZP_AGENT_NAME= -e AZP_POOL= <image>
Step 4: Let's login to the Azure Container Registry.
Docker login <login-server> -u <username> -p <password>
You can get the above parameters from the container registry in the portal:
Step 5: Let's push the image to the Azure Container Registry
Docker push <imagename:tag>
The image should now be in the container registry which allow the containers running the agents to pull it.
Install KEDA on to the AKS cluster
KEDA runs in a container on the Kubernetes cluster and it's not built in so we need to install it. The KEDA pod handled all of the event driven scaling. We are going to use Helm to the install however you can apply the manifests directly.
Make sure you are authenticated to thee AKS cluster before running these next steps.
Step 1: Add the helm repo
helm repo add kedacore https://kedacore.github.io/charts
Step 2: Update the helm repo
helm repo update
Step 3: Create a new namespace and install the KEDA helm chart
kubectl create namespace keda
helm install keda kedacore/keda --namespace keda
You should now see the KEDA pods running in the keda namespace:
We are now ready to start using KEDA for scaling our containers.
Deploying the agents
Set up a new Agent Pool in Azure DevOps
Before we apply the Kubernetes manifests we need to set up a new agent pool in DevOps.
Step 1: Set up a self hosted pool on the organisation level. Take a note of the pool id. This can be found in the URL when you select the pool:
https://dev.azure.com/organisation/_settings/agentpools?poolId=16&view=jobs
In this case it's 16.
Step 2: You will also need to generate a PAT token with Agent Pools read & manage permissions.
Step 3: You will need to encode the token to Base64. You can do this either through Bash or through this website.
Apply the manifests
Because we are going to be using scaling jobs, we cannot specify idle agents. This creates a problem as you can't queue an Azure pipelines job on an empty agent pool so a workaround is to deploy a static agent and turn it offline.
Apply the deployment.yaml
from the cloned repository by running:
kubectl apply -f ./deployment.yaml
Make sure you replace the variables in the YAML for the agent pool name, organisation URL, image and PAT token. See examples below:
- image: =
<acr-name>.azurecr.io/<repository-name>:<tag>
- AZP_URL =
https://dev.azure.com/<organisation-name>
- AZP_TOKEN = Base64 encoded token that we created earlier.
- AZP_POOL = Name of the self-hosted agent pool in Azure DevOps
Once you have applied the manifest you should see the pod running in the default namespace:
We can now see the agent running in DevOps:
Disable the agent and leave it running.
Apply the KEDA Scale Job Manifest
The final step is to apply the scaled job object through our keda-scaled-jobs.yaml
manifest. You will need to replace the values for the image, organisation, pool, token, and pool ID.
kubectl apply -f ./keda-scaled-jobs.yaml
Now that we have applied the scale object, KEDA will be listening to the build queue every 10 seconds (can be customised). I have included a load testing pipeline for Azure DevOps that triggers 10 jobs which run some PowerShell loops. Let's run some jobs and see the scaling in action:
Pods are spinning up:
Agents are coming online:
We have now configured our self-hosted agents to run in Docker using Kubernetes as the orchestrator and KEDA as the scaling engine!
You can customise KEDA on how often it monitors the job queue, maximum number of replicas and more.
Limitations
I have discovered some limitations with Keda and scaling jobs which I'm working on resolving:
- If you run some jobs in DevOps and cancel them the containers keep running.
- KEDA does not remove any pods in a "completed" state. This can be resolved with a custom clean up shell script running as a cron on the cluster however it may be possible to do it with KEDA.
- Offline agents are not removed from the DevOps pool. I have a cron job running in Azure DevOps that cleans this up.
Next Steps
All of the above steps can of course be automated through a simple build pipeline either through Github actions or Azure DevOps pipelines.
I will be building all of this functionality into a Helm chart that will also support Github runners so keep an eye out!
Posted on June 22, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
June 22, 2022