Could you use Prometheus remote-write to simulate a push system?
mikkergimenez
Posted on January 9, 2023
Prometheus is pull-based. I dug into why is prometheus pull based in a prior blog post and then I dug into the push gateway. The push gateway creates an endpoint for services that can't be scraped, like short lived jobs. But the push gateway isn't actually a push based system, as you can read further in the post, it caches metrics to ultimately be scraped by Prometheus at the normal Prometheus scrape interval.
But Prometheus does push metrics upstream. It's how you get metrics from a local prometheus server to a cloud hosted service, or, if you have multiple regions, how you can pull metrics locally and then aggregate them globally. Prometheues accepts and endpoint in the "remote_write" configuration. So my curiousity is piqued:
Could you use remote-write to simulate push? Why or why not?
Based on all my understanding of how Prometheus works, this is not something you would actually want to use, even if it is possible, but I think it will be an interesting exercise to understand more deeply the nuances of how Prometheus works. In this blog post, I'm going to break down how the Prometheus remote_write implementation works, by tracing it's functionality in the source code.
What I figured out in this tracing, is that Prometheus has a Write-Ahead Log (WAL). Per Wikipedia, a Write-Ahead Log is "a family of techniques for providing atomicity and durability (two of the ACID properties) in database systems.[1] A write ahead log is an append-only auxiliary disk-resident structure used for crash and transaction recovery. The changes are first recorded in the log, which must be written to stable storage, before the changes are written to the database."
So prometheus caches all writes to this write-ahead log before processing them for storage, either local or remote. In this article I won't go into the processes that create the write ahead-log but I see how data gets from the WAL to the upstream server.
Prometheus has a queue manager to manage the queue of samples it will push upstream. That queue manager launches shards to increase parallelization. Each shard looks after one of the queues: which are created by the Queue Manager in newShards
queue := s.queues[i]
The shard sends batches of samples. The shard sends batches when either the number of samples to send exceeds MaxSamplesPerSend( default of 500, configured here) or after a deadline, is exceeded, every 5 seconds: BatchSendDeadline defaults to every 5 seconds
case sample, ok := <-queue:
if !ok {
if len(pendingSamples) > 0 {
level.Debug(s.qm.logger).Log("msg", "Flushing samples to remote storage...", "count", len(pendingSamples))
s.sendSamples(pendingSamples)
level.Debug(s.qm.logger).Log("msg", "Done flushing.")
}
return
}
queueLength.WithLabelValues(s.qm.queueName).Dec()
pendingSamples = append(pendingSamples, sample)
if len(pendingSamples) >= s.qm.cfg.MaxSamplesPerSend {
s.sendSamples(pendingSamples[:s.qm.cfg.MaxSamplesPerSend])
pendingSamples = pendingSamples[s.qm.cfg.MaxSamplesPerSend:]
stop()
timer.Reset(s.qm.cfg.BatchSendDeadline)
}
case <-timer.C:
if len(pendingSamples) > 0 {
s.sendSamples(pendingSamples)
pendingSamples = pendingSamples[:0]
}
timer.Reset(s.qm.cfg.BatchSendDeadline)
}
But how does Prometheus end up using remote_write? It does it asyncronously, after reading the WAL (Write-Ahead-Log). There is an Appender interface, managed in the storage code her:
Storage in storage.go defines an Appender
#line 183
func (s *Storage) Appender(ctx context.Context) storage.Appender {
return s.rws.Appender(ctx)
}
# line 57
# rws is of type WriteStorage
rws *WriteStorage
write.go implements WriteStorage with which has a map of the aformentioned QueueManager quues.
type WriteStorage struct {
logger log.Logger
reg prometheus.Registerer
mtx sync.Mutex
watcherMetrics *wlog.WatcherMetrics
liveReaderMetrics *wlog.LiveReaderMetrics
externalLabels labels.Labels
dir string
queues map[string]*QueueManager
samplesIn *ewmaRate
flushDeadline time.Duration
interner *pool
scraper ReadyScrapeManager
quit chan struct{}
// For timestampTracker.
highestTimestamp *maxTimestamp
}
The QueueManager creates a 'watcher' which reads the WAL(write ahead log) and writes samples to the queue to be sent upstream using remote_write.
t.watcher = wlog.NewWatcher(watcherMetrics, readerMetrics, logger, client.Name(), t, dir, enableExemplarRemoteWrite, enableNativeHistogramRemoteWrite)
/* Run the watcher, which will tail the WAL until the quit channel is closed
or an error case is hit. ( https://github.com/prometheus/prometheus/blob/48bccc50c8cca58098f4a9e2c4c54556cab31230/tsdb/wlog/watcher.go#L225)
*/
// Line 245
if err == nil {
if err = w.readCheckpoint(lastCheckpoint, (*Watcher).readSegment); err != nil {
return errors.Wrap(err, "readCheckpoint")
}
}
// readSegment ultimately calls Append on QueueManager, which we saw above was assigned when creating the wacher.
// Line 521
if len(samplesToSend) > 0 {
w.writer.Append(samplesToSend)
samplesToSend = samplesToSend[:0]
}
Then we go full circle, the Append functin on QueueManager calls enqueue on shards to add to the queue, taking us back to the first example where the shard sends batches of samples.
I think it would be fun to experiment with this, so in the next two weeks I'm going to elaborate on this post. Next week I'll look at the remote_write API endpoint to see how Prometheus accepts remote_write data. In the blog post that follows, I'm going to try an experiment. I'll create a lightweight go agent to measure a single metric; like CPU utilization and use remote_write to push metrics into Prometheus. As I said before, this is entirely for experimentation and understanding of the underlying primitives. I doubt anyone could recommend actually creating write-agents this way.
References:
Posted on January 9, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 30, 2024