How to extend the Geth collector

This is the the last of a 2-part blog post series regarding Netdata and Geth. If you missed the first, be sure to check it out here.

Geth is short for Go-Ethereum and is the official implementation of the Ethereum Client in Go. Currently it's one of the most widely used implementations and a core piece of infrastructure for the Ethereum ecosystem.

With this proof of concept I wanted to showcase how easy it really is to gather data from any Prometheus endpoint and visualize them in Netdata. This has the added benefit of leveraging all the other features of Netdata, namely it's per-second data collection, automatic deployment and configuration and superb system monitoring.

The most challenging aspect is to make sense of the metrics and organize them into meaningful charts. In other words, the expertise that is required to understand what each metric means and if it makes sense to surface it for the user.

Note that some metrics would make sense for some users, and other metrics for others. We want to surface all metrics that make sense. When developping an application, you need much lower level metrics (e.g eBPF), than when operating the application.

Let's get down to it.

A note on collectors

First, let's do a very brief intro to what a collector is.

In Netdata, every collector is composed of a plugin and a module. The plugin is an orchestrator process that is responsible for running jobs, each job is an instance of a module.

When we are "creating" a collector, in essence we select a plugin and we develop a module for that plugin.

For Geth, since we are using the Prometheus Endpoint, it's easier to use our Golang Plugin, as it has internal libraries to gather data from Prometheus endpoints.

The following image is useful:

If you want to dive into the Netdata Collector framework:

Geth collector structure

So, in essence, the Geth collector is the Geth module of the Go.d.plugin.

As you can see on GitHub, the module is composed of four files:

charts.go: Chart definitions
collect.go: Actual data collection, using the metric variables defined in metrics.go
geth.go: Main structure, mostly boilerplate.
metrics.go: Define metric variables to the corresponding Prometheus values

How to extend the Geth collector with a new metric

It's very simply, really.

Open your Prometheus endpoint and find the metrics that you want to visualize with Netdata.

e.g p2p_ingress_eth_65_0x08

Open metrics.go and define a new variable

e.g const p2pIngressEth650x08 = "p2p_ingress_eth_65_0x08"

Open collect.go and create a new function, identical to the one that already exist. Although it doesn't really makes a difference in our case, we strive to organize the metrics into sensible functions (e.g gather all p2pEth65 metrics in one function). This is the function that we will do any computation on the raw value that we gather.

Note that Netdata will automatically take care of units such as bytes and will show the most human readable unit in the dashboard (e.g MB, GB, etc.)

e.g

func (v *Geth) collectP2pEth65(mx map[string]float64, pms prometheus.Metrics) {
    pms = pms.FindByNames(
        p2pIngressEth650x08
    )
    v.collectEth(mx, pms)
    mx[p2pIngressEth650x08] = mx[p2pIngressEth650x08] + 1234

}

func (v *Geth) collectEth(mx map[string]float64, pms prometheus.Metrics) {
    for _, pm := range pms {
        mx[pm.Name()] += pm.Value
    }

We also need to add the function in the central function that is called by the module at the defined interval.

func (g *Geth) collectGeth(pms prometheus.Metrics) map[string]float64 {
    mx := make(map[string]float64)
    g.collectChainData(mx, pms)
    g.collectP2P(mx, pms)
    g.collectTxPool(mx, pms)
    g.collectRpc(mx, pms)
    g.collectP2pEth65(mx, pms)
    return mx
}

Lastly, now that we have the value inside the module, we need to create the chart for that value. We do that in charts.go:

chartReorgs = Chart{
        ID:    "reorgs_executed",
        Title: "Executed Reorgs",
        Units: "reorgs",
        Fam:   "reorgs",
        Ctx:   "geth.reorgs",
        Dims: Dims{
            {ID: reorgsExecuted, Name: "executed"},
        },
    }
    chartReorgsBlocks = Chart{
        ID:    "reorgs_blocks",
        Title: "Blocks Added/Removed from Reorg",
        Units: "blocks",
        Fam:   "reorgs",
        Ctx:   "geth.reorgs_blocks",
                Type:  Line, 
        Dims: Dims{
            {ID: reorgsAdd, Name: "added", Algorithm: "absolute"},
            {ID: reorgsDropped, Name: "dropped"},
        },
    }

Let's explain the fields of the structure:

ID: The unique identification for the chart.
Title: A human readable title for the front-end.
Units: The units for the dimension. Notice that Netdata can automatically scale certain units, so that the raw collector value stays in bytes but the user sees Megabytes on the dashboard. You can find a list of supported "automatically scaled" units on this file.
Fam: The submenu title, used to group multiple charts together.
Ctx: The identifier for the particular chart, kinda like id. Use the convention <collector_name>.<chart_id>.
Type: Line (Default) or Area or Stacked. Area is best used with dimensions that signify "bandwidth". Stacked when it make sense to visually observe the sum of dimensions. (e.g thesystem.ram chart is stacked).
Dims:
- ID: The variable name for that dimension.
- Name: human readable name for the dimension.
- Algorithm:
  - absolute: Default (if omitted) is absolute. Netdata will show the value that it gets from the collector.
  - incremental: Netdata will show the per-second rate of the value. It will automatically take the delta between two data collections, find the per-second value and show it.
  - percentage: Netdata will show the percentage of the dimension in relation to the sum of all the dimensions of the chart. If four dimensions have value = 1, it will show 25%.
  - Mul: Multiply value by some integer.
  - Div: Divide value by some integer.

A final note on extending Geth

The prometheus endpoint is not the only way to monitor Geth, but it's the simplest.

If you feel adventurous, you can try to implement a collector that also uses Geth's RPC endpoint to pull data (e.g show charts about specific contracts in real time) or even Geth's logs.

To use Geth's RPC endpoint with Golang, take a look at Geth's documentation.

To monitor Geth's logs, you can use our weblog collector as a template. It monitors Apache and NGINX servers by parsing their logs.

Add alerts to Geth charts

Now that we have defined the new charts, we may want to define alerts for them. The full alert syntax is out-of-scope for this tutorial, but it shouldn't be difficult once you get the hang of it.

For example, here is a simple alarm that tells me if Geth is synced or not, based on whether header and block values are the same:

  1 #chainhead_header is expected momenterarily to be ahead. If its considerably ahead (e.g more than 5 blocks), then the node is definetely out of sync.
  2  template: geth_chainhead_diff_between_header_block
  3        on: geth.chainhead
  4     class: Workload
  5      type: ethereum_node
  6 component: geth
  7     every: 10s
  8      calc: $chain_head_block -  $chain_head_header
  9     units: blocks
 10      warn: $this != 0
 11      crit: $this > 5
 12     delay: up 5s

You can read the above example as follows:
On the charts that have the context geth.chainhead (thus all the Geth nodes that we may monitor with a single Netdata Agent), every 10s, caluclate the difference between the dimensions chain_head_block and chain_head_header. If it's not 0, then raise alert to warn. If it's more than 5, then raise to critical.

Some useful resources to get you up to speed quickly with creating alerts for our Geth node:

Note that if you create an alert and it works for you, a great idea is to make a PR into the main netdata/netdata repository. That way, the alert definition will exist in every netdata installation, and you will help countless other Geth users.

Here are some useful resources to create new alerts:

Extend Geth collector for other clients

The beauty of this solution is that it's trivial to duplicate the collector and gather metrics from all Ethereum clients that support the Prometheus endpoint:

The only difference between a Geth collector and a Nethermind collector is that they might expose different metrics or the same metrics with different "Prometheus metrics names". So, we just need to change the Prometheus metrics names in the metrics.go source file and propagate any change to the other source files as well.

The logic that I described above stays exactly the same.

In conclusion

Extending Geth for more metrics is trivial.

As you may suspect, this guide is applicable for any data source that is exposing it's metrics using the Prometheus format.

Blog

How to extend the Geth-Netdata integration

Odysseas Lamtzidis