Observability Challenge: Configuring Rules to Scale Your Application
Danilo P. De Luca
Posted on January 20, 2024
Working on a scalable architecture presents numerous challenges, especially when dealing with diverse components within the system. As the architecture expands, it becomes increasingly challenging to identify and address problems.
Several practices can prove helpful in such situations, with Observability standing out as a key strategy.
In this article, you will discover how adopting effective Observability practices can assist in configuring rules for scaling your application. By implementing these practices, you can also reduce infrastructure costs by the end of the month.
Before delving deeper, let's briefly explore some Observability concepts.
What is Observability?
As Cindy Sridharan once mentioned in her book Distributed Systems Observability:
Observability is a spectrum of tooling and practices, with the end goal being the ability to understand the entire system’s state by looking at it from the outside.
Observability allows us to perform a range of tasks, including monitoring metrics, logging, and tracing. The primary goal is to comprehend the activities within our architecture and systems.
Monitoring Metrics: This involves quantitatively measuring various aspects of a system's performance. In simpler terms, it helps us understand the fundamental resources of our infrastructures, such as CPU usage, memory utilization, networking latency, and more. The focus here is on identifying trends, anomalies, and potential bottlenecks.
Logging: Have you ever debugged a feature and used logs to understand what's happening? That's logging! This approach entails capturing and storing data about the activities within a system, which is crucial for troubleshooting issues and bugs.
Tracing: Tracing is about analyzing different components of your system and architecture. It provides the entire flow of a request or transaction, moving through various components and services. Tracing can offer a dependency map of the different parts of your system, aiding in the identification of performance issues or bottlenecks.
Configuring scalable rules for your application
As evident, Observability emphasizes sharing metrics from our systems, enabling us to take informed actions. One of the pivotal advantages of scalable systems and architectures is leveraging metrics to establish rules for auto-scaling during periods of high usage or processing demands. Achieving this involves a systematic analysis of the data provided by observability tools.
Now, let's delve into the steps for analyzing observability tools.
1. Finding Behavioral Patterns
In the following example, you'll observe metrics from a Java application that adhered to a single scaling rule: maintaining an average of 60k requests within a 5-minute timeframe. This application operates within a Kubernetes (K8s) cluster, where each pod is allocated a maximum of 16GB memory and 4 CPU cores.
In the image below, you'll find a snapshot displaying the number of pods/containers running in a Kubernetes application of our system. We are currently using Datadog for monitoring our architecture.
To identify patterns and understand the application's behavior, observe the peaks in the graph. In the following image, you'll notice distinct cycles indicating common scaling patterns of the application.
Some key observations on it:
Days 12, 13, and 14 fall on the weekend (Friday, Saturday, Sunday), marked by heightened access peaks in our application (the purple round).
Consistently, during lunch hours from 11 am to 1 pm every day of the week, we observe a similar pattern (the pink round).
Similarly, during breakfast hours from 8 am to 10 am daily, we notice a consistent behavior (the orange round).
Weekdays 09, 10, and 11 (Tuesday, Wednesday, Thursday) exhibit nearly identical usage peaks (the green round).
2. Merge The Founded Patterns With Other Different Metrics
The importance of this second step lies in avoiding reliance on a single metric as the sole source of truth for our analyses. Therefore, it is advisable to employ a minimum of two metrics to enhance the depth of your application analysis. In this instance, I will utilize one of the most common metrics in applications: Requests Per Minute.
The image below provides a snapshot of the application's Requests Per Minute metrics for the same dates.
Here, you'll observe a continuation of the same behavior as indicated by the colored rounds, providing further justification for the application's scaling of the pod count.
The third metric to consider is the resource utilization of your application. Monitoring resource usage is essential for assessing its health and ensuring efficient resource utilization.
The image below presents a snapshot of our application during the same timeframe, depicting the percentage usage of CPU and Memory resources.
So... what about that? Upon closer examination, do you notice anything peculiar? Why is the application scaling pods when the resource usage is less than 20% for CPU and less than 5% for Memory?
3. Finding Anomalies
As evident from the earlier observations, anomalies may exist in our application, given that we're scaling the number of pods despite the application's resource usage being below 20%.
To address this, our first step is to comprehend the auto-scale algorithms. Kubernetes introduces the concept of HPA (Horizontal Pod Autoscale), utilizing both container metrics (CPU, memory) and external metrics (requests per second or custom metrics exposed by your application) to determine when scaling a new pod is necessary within a namespace.
In the case of this application, only one metric was configured in HPA: the number of requests in the past 5 minutes. While this aligns with the application's need for high availability during peaks of throughput over HTTP requests, relying solely on this metric might not be sufficient for effective scaling.
Having identified an anomaly in our application, it's time to take corrective action! In this scenario, we are going to reconfigure the HPA for the application. The modifications we'll implement are grounded in the observed behavior and metrics gathered during the Observability process. It's crucial to recall that we identified a low usage of infrastructure resources (CPU and Memory).
The adjustment involves shifting the HPA to utilize resource metrics for scaling our app. While retaining the metric for server requests in the past 5 minutes, we've decided to increase its threshold from 20,000m to 60,000m.
metrics:
- type: Pods
pods:
metric:
name: http_server_requests_seconds_sum_rate5m
target:
type: AverageValue
averageValue: "60000m"
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 75
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
After implementing any changes, it's crucial to emphasize the importance of restarting the Observability process. By doing so, you can assess whether the application's behavior has been altered and gather meaningful results from the updated configuration.
4. Restart The Observability Process After The Changes
In the image below, you'll notice the precise moment when we deployed the changes—Wednesday at around 4 PM. On the same day, a distinct scaling pattern emerges during dinner time.
Notice the shift in patterns when examining the period from day 17 to day 20. In the image below, the colored rounds depict the same peaks and patterns as we observed previously.
After a thorough analysis, as depicted in the previous image, we can draw the following conclusions:
During lunchtime - indicated by the pink round - the number of pods decreased significantly, from an average of 49 pods between days 9 and 17 to 31 pods on day 18 and 34 pods on day 19, reflecting an almost 30% reduction in pods!
During dinnertime on weekdays - marked by the green round - the number of pods decreased from an average of 75 pods on days 9 to 11 to 37 pods on days 17 and 18, representing almost a 40% reduction in pods!
During dinnertime on the weekend - characterized by the purple round - the number of pods decreased from an average of 120 pods on days 12 to 14 to 82 pods on day 20, reflecting almost a 42% reduction in pods!
Important Note: On Day 20 (Friday) we ran an advertisement in a TV Show (Big Brother Brasil) that increased the number of access by +20% and as a strategy for it we decided to increase the minimum number of pods to 25 (it was 3 pods before).
In conclusion, Observability is indispensable for maintaining the efficiency of your scalable architecture, and it goes beyond that! Utilizing observability tools can also be instrumental in reducing infrastructure costs. In this straightforward example, we are poised to save almost 40% of the previous month's cost for this application by making a seemingly minimal adjustment: adding only 10 lines of code'!
Looking ahead to this same application, our next steps involve further reducing the infrastructure resources allocated to the application and fine-tuning the request per minute metric (http_server_requests_seconds_sum_rate5m). This adjustment is vital as we've observed the scaling behavior without surpassing 30% of the allocated infrastructure resources.
Do you have any other observability metrics or strategies for configuring an auto-scale tool? Feel free to share your insights!
Posted on January 20, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.