Prometheus: Kubernetes endpoints monitoring with blackbox-exporter
Arseny Zinchenko
Posted on December 12, 2022
The blackbox-exporter is an exporter that can monitor various endpoints — URLs on the Internet, your LoadBalancers in AWS, or Services in a Kubernetes cluster, such as MySQL or PostgreSQL databases.
Blackbox Exporter can give you HTTP response time statistics, response codes, information on SSL certificates, etc.
What are we going to do in this post:
- with the help of Helm, will deploy the kube-prometheus-stack in Minikube
- deploy the Blackbox Exporter itself
- configure monitoring of endpoints with the Kubernetes ServiceMonitors, which will be created through the blackbox-exporter config
- will take a brief overview of Blacbkox’ probes which are used to poll endpoints
Let’s go.
Running the Kube Prometheus Stack
We will do this setup in the Minikube, where we will install Prometheus Operator from the Helm repository.
Launch the Minicube itself:
$ minikube start
Add the Prometheus chart repository:
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
Create a namespace:
$ kubectl create ns monitoring
Install the kube-prometheus-stack
chart:
$ helm -n monitoring install prometheus prometheus-community/kube-prometheus-stack
Wait a few minutes until all pods become Running:
$ kubectl -n monitoring get pod
NAME READY STATUS RESTARTS AGE
alertmanager-prometheus-kube-prometheus-alertmanager-0 1/2 Running 1 (25s ago) 44s
prometheus-grafana-599dbccb79-zlklx 2/3 Running 0 57s
prometheus-kube-prometheus-operator-689dd6679c-s66vp 1/1 Running 0 57s
prometheus-kube-state-metrics-6cfd96f4c8–84j26 1/1 Running 0 57s
prometheus-prometheus-kube-prometheus-prometheus-0 0/2 PodInitializing 0 44s
prometheus-prometheus-node-exporter-2h542 1/1 Running 0 57s
Find the Prometheus Service:
$ kubectl -n monitoring get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 7s
prometheus-grafana ClusterIP 10.97.79.182 <none> 80/TCP 20s
prometheus-kube-prometheus-alertmanager ClusterIP 10.106.147.39 <none> 9093/TCP 20s
prometheus-kube-prometheus-operator ClusterIP 10.98.222.45 <none> 443/TCP 20s
prometheus-kube-prometheus-prometheus ClusterIP 10.107.26.113 <none> 9090/TCP 20s
…
Open access to the Service by using the port-forward
:
$ kubectl -n monitoring port-forward svc/prometheus-kube-prometheus-prometheus 9090:9090
Open http://localhost:9090, and check if everything is working:
Running blackbox-exporter
Its chart present in the same repository, so just install the exporter:
helm -n monitoring upgrade — install prometheus-blackbox prometheus-community/prometheus-blackbox-exporter
Check the Pod:
$ kubectl -n monitoring get pod
NAME READY STATUS RESTARTS AGE
prometheus-blackbox-prometheus-blackbox-exporter-6865d9b44h546j 1/1 Running 0 27s
…
Blackbox keeps its config in a ConfigMap, which connects to the Pod and passes default parameters. See more here>>>.
kubectl -n monitoring get cm prometheus-blackbox-prometheus-blackbox-exporter -o yaml
apiVersion: v1
data:
blackbox.yaml: |
modules:
http_2xx:
http:
follow_redirects: true
preferred_ip_protocol: ip4
valid_http_versions:
- HTTP/1.1
- HTTP/2.0
prober: http
timeout: 5s
Actually, here we can see the modules, just one so far, which use the http
prober to make HTTP requests to the targets
, which still needs to be added.
Blackbox and ServiceMonitor
In order to add endpoints that we want to monitor, we can use ServiceMonitor, see config here>>>.
For some reason, this moment is not really described anywhere in the googled guides, although it is very useful and simple: we add a list of targets to the Blackbox config, and the Blackbox creates a ServiceMonitor for each of them, and Prometheus starts monitoring them.
Create a file blackbox-exporter-values.yaml
with only one endpoint for now - just to check if it's working at all:
serviceMonitor:
enabled: true
defaults:
labels:
release: prometheus
targets:
- name: google.com
url: [https://google.com](https://google.com)
If not specified otherwise, Blackbox uses the default values from the values.yaml of the chart, in this case, it will be the http_2xx
module that executes GET request and checks the response code: if the 200 is received, then the check is passed, if another, then it's failed.
Update the Helm release with the new config:
$ helm -n monitoring upgrade — install prometheus-blackbox prometheus-community/prometheus-blackbox-exporter -f blackbox-exporter-values.yaml
Check if the ServiceMonitor has been created:
kubectl -n monitoring get servicemonitor
NAME AGE
prometheus-blackbox-prometheus-blackbox-exporter-google.com 4m43s
Check the Prometheus Targets:
For each Target that we specify in the Blackbox configuration, a separate scrape job is added in the Prometheus:
And check the Blackbox metrics:
The main metric that I personally use is the probe_success
, which actually tells whether the check has been passed:
Here, in the target label, metricRelabelings
sets a value from the name
filed of the target from the Blackbox config, and the instance
label has the URL.
Internal endpoints monitoring
Great — we went to Google, and it even works.
What about checking endpoints within a cluster?
Let’s take the example of nginx from the Kubernetes documentation, just will deploy its Pod and Service to our own namespace, not the default
.
Create a namespace:
$ kubectl create ns test-ns
namespace/test-ns created
Create a manifest with the Pod and Service, add your namespace:
apiVersion: v1
kind: Pod
metadata:
name: nginx
namespace: test-ns
labels:
app.kubernetes.io/name: proxy
spec:
containers:
- name: nginx
image: nginx:stable
ports:
- containerPort: 80
name: http-web-svc
---
apiVersion: v1
kind: Service
metadata:
name: nginx-service
namespace: test-ns
spec:
selector:
app.kubernetes.io/name: proxy
ports:
- name: name-of-service-port
protocol: TCP
port: 80
targetPort: http-web-svc
Deploy it:
$ kubectl apply -f testpod-with-svc.yaml
pod/nginx created
service/nginx-service created
Check the resources:
% kubectl -n test-ns get all
NAME READY STATUS RESTARTS AGE
pod/nginx 1/1 Running 0 23s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/nginx-service ClusterIP 10.106.58.247 <none> 80/TCP 23s
Update the Blackbox config:
serviceMonitor:
enabled: true
defaults:
labels:
release: prometheus
targets:
- name: google.com
url: [https://google.com](https://google.com)
- name: nginx-test
url: nginx-service.test-ns.svc.cluster.local:80
Update the Helm release:
$ helm -n monitoring upgrade — install prometheus-blackbox prometheus-community/prometheus-blackbox-exporter -f blackbox-exporter-values.yaml
Check ServiceMonitors again:
$ kubectl -n monitoring get servicemonitor
NAME AGE
prometheus-blackbox-prometheus-blackbox-exporter-google.com 12m
prometheus-blackbox-prometheus-blackbox-exporter-nginx-test 5s
And in a minute we can check the probe_success
:
In general, it is not necessary to specify the full URL in the form of nginx-service.test-ns.svc.cluster.local
- it will be enough to set it like servicename.namespace, that is nginx-service.test-ns
, but the full URL, in my opinion, looks more usable in labels and alerts.
Blackbox Exporter modules
Everything looks great until we poll a common HTTP endpoint that always returns a 200 code.
But how can we check for other HTTP codes?
Let’s create our own module using Blackbox probes:
config:
modules:
http_4xx:
prober: http
timeout: 5s
http:
method: GET
valid_status_codes: [404, 405]
valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
follow_redirects: true
preferred_ip_protocol: "ip4"
serviceMonitor:
enabled: true
defaults:
labels:
release: prometheus
targets:
- name: google.com
url: [https://google.com](https://google.com)
- name: nginx-test
url: nginx-service.test-ns.svc.cluster.local:80
- name: nginx-test-404
url: nginx-service.test-ns.svc.cluster.local:80/404
module: http_4xx
Here in the modules we specify the name of the new module - http_4xx, which probe it should use - the http
, and the parameters for this probe - what kind of request to use, and which response codes we consider correct.
Next, in the Targets for nginx-test-404, we explicitly specify the use of the module http_4xx
.
Modules testing
Let’s see how we can check whether the module will work as we expect.
Everything is simple: run a test pod, and use the curl
with the -I option to check the response of the endpoint.
For a TCP connection, you can use the telnet
.
So, create a Pod with Ubuntu, and connect to it by running the bash
:
$ kubectl -n monitoring run pod --rm -i --tty — image ubuntu --bash
Install the curl
and telnet
:
root@pod:/# apt update && apt -y install curl telnet
And check if the nginx-service.test-ns.svc.cluster.local:80/404 is working and which response code it will return:
root@pod:/# curl -I nginx-service.test-ns.svc.cluster.local:80/404
HTTP/1.1 404 Not Found
404 — as we expected.
Update the Blackbox with a new configuration:
$ helm -n monitoring upgrade — install prometheus-blackbox prometheus-community/prometheus-blackbox-exporter -f blackbox-exporter-values.yaml
Let’s check its ConfigMap - whether the module http_4xx
that we specified in our config file has been added:
$ kubectl -n monitoring get cm prometheus-blackbox-prometheus-blackbox-exporter -o yaml
apiVersion: v1
data:
blackbox.yaml: |
modules:
http_2xx:
http:
follow_redirects: true
preferred_ip_protocol: ip4
valid_http_versions:
- HTTP/1.1
- HTTP/2.0
prober: http
timeout: 5s
http_4xx:
http:
follow_redirects: true
method: GET
preferred_ip_protocol: ip4
valid_http_versions:
- HTTP/1.1
- HTTP/2.0
valid_status_codes:
- 404
- 405
prober: http
timeout: 5s
And check the result in the Prometheus:
probe_success{target="nginx-test-404"} == 1
- "It works!" (c)
TCP Connect and a database server monitoring
Another module that we use very often is the TCP, which simply tries to open a TCP connection to the specified URL and port. Suitable for checking databases and any other non-HTTP resources.
Let’s start a MySQL server:
$ helm repo add bitnami https://charts.bitnami.com/bitnami
helm install mysql bitnami/mysql
Find its Service:
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 20h
mysql ClusterIP 10.99.71.124 <none> 3306/TCP 40s
mysql-headless ClusterIP None <none> 3306/TCP 40s
Update the Blackbox config:
config:
modules:
...
tcp_connect:
prober: tcp
serviceMonitor:
...
targets:
...
- name: mysql
url: mysql.default.svc.cluster.local:3306
module: tcp_connect
Deploy and check:
Prometheus alerting
There is nothing special to write about alerting — everything is standard like any other Prometheus alerts.
For example, we monitor Apache Druid Services with the following alert (screen from a Terraform configuration with some variables):
Just check that probe_success != 1
.
Useful links
- Blackbox exporter probes — more probes examples
Originally published at RTFM: Linux, DevOps, and system administration.
Posted on December 12, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.