A Gotcha With Gauges
Jeff Dwyer
Posted on April 24, 2023
I had a gotcha recently trying to use a Gauge metric that I wanted to share.
I love metrics and I use Micrometer as a Facade for them.
Typical usage looks like the following, although you can also annotate a method with @Timed
which is delightful.
SimpleMeterRegistry oneSimpleMeter = new SimpleMeterRegistry();
Counter counter = registry.counter("page.visitors", "age", "20s");
counter.increment();
I typically use counters and timers and they have always been pretty straightforward.
We're building a system for dynamic config now and as part of that I wanted to track the current number of connections. Counters and Timers aren't right for this sort of thing however, what I needed was a Gauge, which should work right out of the box.
I did something like the following and thought that would be it:
class ConnectionHolder
private final AtomicInteger connections;
@Inject
public ConnectionHolder(MeterRegistry meterRegistry) {
connections =
meterRegistry.gauge(
"config.project-connections",
Tags.empty(),
new AtomicInteger()
);
}
@Scheduled(fixedDelay = "1m")
public void recordConnections(){
projectConnections.set(calculateConnections());
}
}
I made 2 connections to test it out and looked at the metrics. I was very surprised to see my numbers all over the place! Sometimes zero, sometimes one, sometimes 2.
The Problem
Trying to use Gauges exposed a setup problem in my DataDog metrics. The core issue is that Gauges are not additive in the same way as Counters. Here's what I mean:
If server A writes a metric counter:apples=2
And server B writes a metric counter:apples=1
Both of those metrics will be saved in the time period and we can add those together and get 3.
Gauges however, are stored differently.
If server A writes a metric gauge:apples=2
And server B writes a metric gauge:apples=1
The metrics backend will only store the most recent value: gauge:apples=1
.
In my case, this meant we had a minute by minute race conditions between the two running pods.
The Solution
The solution is proper metric tagging. We need each server to be tagging the result as having come from it. This prevents the backend from clobbering our values here.
gauge:apples=2,pod:abc123
gauge:apples=1,pod:xyz789
The problem in my case was that DataDog was not sending stats through my Datadog sidecar and was sending them straight to the Datadog API, which bypassed the functioning tagging process that was working for APM.
For more info, there are full details in this post, which gets into the details of passing the pod ID into the containers and setting the tagging explicitly.
Takeaway
You can forget about the underlying way that stats are collected with counters, but gauges are less forgiving. Keep that in mind and happy counting.
Posted on April 24, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.