Back2Basics: Monitoring Workloads on Amazon EKS
Romar Cablao
Posted on June 26, 2024
Overview
We're down to the last part of this series✨ In this part, we will explore monitoring solutions. Remember the voting app we've deployed? We will set up a basic dashboard to monitor each component's CPU and memory utilization. Additionally, we’ll test how the application would behave under load.
If you haven't read the second part, you can check it out here:
Back2Basics: Running Workloads on Amazon EKS
Romar Cablao for AWS Community Builders ・ Jun 19
Grafana & Prometheus
To start with, let’s briefly discuss the solutions we will be using. Grafana and Prometheus are the usual tandem for monitoring metrics, creating dashboards and setting up alerts. Both are open-source and can be deployed on a Kubernetes cluster - just like what we will be doing in a while.
-
Grafana
is open source visualization and analytics software. It allows you to query, visualize, alert on, and explore your metrics, logs, and traces no matter where they are stored. It provides you with tools to turn your time-series database data into insightful graphs and visualizations. Read more: https://grafana.com/docs/grafana/latest/fundamentals/ -
Prometheus
is an open-source systems monitoring and alerting toolkit. It collects and stores its metrics as time series data, i.e. metrics information is stored with the timestamp at which it was recorded, alongside optional key-value pairs called labels. Read more: https://prometheus.io/docs/introduction/overview/
Alternatively, you can use an AWS native service like Amazon CloudWatch
, or a managed service like Amazon Managed Service for Prometheus
and Amazon Managed Grafana
. However, in this part, we will only cover self-hosted Prometheus
and Grafana
, which we will host on Amazon EKS.
Let's get our hands dirty!
Like the previous activity, we will use the same repository. First, make sure to uncomment all commented lines in 03_eks.tf
, 04_karpenter.tf
and 05_addons.tf
to enable Karpenter
and other addons we used in the previous activity.
Second, enable Grafana
and Prometheus
by adding these lines in terraform.tfvars
:
enable_grafana = true
enable_prometheus = true
Once updated, we have to run tofu init
, tofu plan
and tofu apply
. When prompted to confirm, type yes
to proceed with provisioning the additional resources.
Accessing Grafana
We need credentials to access Grafana. The default username is admin
and the auto-generated password is stored in a Kubernetes secret
. To retrieve the password, you can use the command below:
kubectl -n grafana get secret grafana -o jsonpath="{.data.admin-password}" | base64 -d
This is what the home or landing page would look like. You have the navigation bar on the left side where you can navigate through different features of Grafana, including but not limited to Dashboards
and Alerting
.
It's worth noting the Prometheus
that we have deployed. You might be asking - Does the Prometheus
server have a UI? Yes, it does. You can even query using PromQL
and check the health of the targets. But we will use Grafana for the visualization instead of this.
Setting up our first data source
Before we can create dashboards and alerts, we first have to configure the data source.
First, expand the Connections
menu and click Data Sources
.
Click Add data source
. Then select Prometheus
.
Set the Prometheus server URL to http://prometheus-server.prometheus.svc.cluster.local
. Since Prometheus
and Grafana
reside on the same cluster, we can use the Kubernetes service
as the endpoint.
Leave other configuration as default. Once updated, click Save & test
.
Now we have our first data source! We will use this to create dashboard in the next few section.
Grafana Dashboards
Let’s start by importing an existing dashboard. Dashboards can be searched here: https://grafana.com/grafana/dashboards/
For example, consider this dashboard - 315: Kubernetes Cluster Monitoring via Prometheus
To import this dashboard, either copy the Dashboard ID
or download the JSON
model. For this instance, use the dashboard ID 315
and import it into our Grafana
instance.
Select the Prometheus
data source we've configured earlier. Then click Import
.
You will then be redirected to the dashboard and it should look like this:
Yey🎉 We now have our first dashboard!
Let's Create a Custom Dashboard for our Voting App
Copy this JSON
model and import it into our Grafana instance. This is similar to the steps above, but this time, instead of ID, we'll use the JSON
field to paste the copied template.
Once imported, the dashboard should look like this:
Here we have the visualization for basic metrics such as cpu
and memory
utilization for each components. Also, replica count
and node count
were part of the dashboard so we can check in later the behavior of vote-app component when it auto scale.
Let's Test!
If you haven't deployed the voting-app
, please refer to the command below:
helm -n voting-app upgrade --install app -f workloads/helm/values.yaml thecloudspark/vote-app --create-namespace
Customize the namespace voting-app
and release name app
as needed, but update the dashboard query accordingly. I recommend to use the command above and use the same naming: voting-app
for namespace and app
as the release name.
Back to our dashboard: When the vote-app
has minimal load, it scales down to a single replica (1), as shown below.
Horizontal Pod Autoscaling in Action
The vote-app
deployment has Horizontal Pod Autoscaler (HPA) configured with a maximum of five replicas. This means the voting app will automatically scale up to five pods to handle increased load. We can observe this behavior when we apply the seeder
deployment.
Now, let's test how the vote-app
handles increased load using a seeder
deployment.
apiVersion: apps/v1
kind: Deployment
metadata:
name: seeder
namespace: voting-app
spec:
replicas: 5
...
The seeder
deployment simulates real user load by bombarding the vote-app
with vote requests. It has five replicas and allows you to specify the target endpoint using an environment variable. In this example, we'll target the Kubernetes service
directly instead of the load balancer.
...
env:
- name: VOTE_URL
value: "http://app-vote.voting-app.svc.cluster.local/"
...
To apply, use the command below:
kubectl apply -f workloads/seeder/seeder-app.yaml
After a few seconds, monitor your dashboard. You'll see the vote-app
replicas increase to handle the load generated by the seeder
.
D:\> kubectl -n voting-app get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
app-vote-hpa Deployment/app-vote cpu: 72%/80% 1 5 5 12m
Since the vote-app
chart's default max value for the horizontal pod autoscaler (HPA) is five, we can see that the replica for this deployment stops at five.
Stopping the Load and Scaling Down
Once you've observed the scaling behavior, delete the seeder
deployment to stop the simulated load:
kubectl delete -f workloads/seeder/seeder-app.yaml
Give the dashboard a few minutes and observe the vote-app
scaling down. With no more load, the HPA will reduce replicas, down to a minimum of one. This may also lead to a node being decommissioned by Karpenter
if pod scheduling becomes less demanding.
You'll see that the vote-app eventually scales in as there is lesser load now. As you might see above, the node count also change from two to one - showing the power of Karpenter.
PS D:\> kubectl -n voting-app get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
app-vote-hpa Deployment/app-vote cpu: 5%/80% 1 5 2 18m
Challenge: Scaling Workloads
We've successfully enabled autoscaling for the vote-app
component using Horizontal Pod Autoscaler (HPA). This is a powerful technique to manage resource utilization in Kubernetes. But HPA isn't limited to just one component.
Tip: Explore the ArtifactHub: Vote App configuration in more detail. You'll find additional configurations related to HPA that you can leverage for other deployments.
Conclusion
Yey! You've reached the end of the Back2Basics: Amazon EKS Series
🌟🚀. This series provided a foundational understanding of deploying and managing containerized applications on Amazon EKS. We covered:
- Provisioning an EKS cluster using OpenTofu
- Deploying workloads leveraging Karpenter
- Monitoring applications using Prometheus and Grafana
While Kubernetes can have a learning curve, hopefully, this series empowered you to take your first steps. Ready to level up? Let me know in the comments what Kubernetes topics you'd like to explore next!
Posted on June 26, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.