Troubleshooting Kubernetes: Cluster and Node Logging

Troubleshooting any system can feel like a complex, detective-like task, but when it comes to Kubernetes (K8s), an extra layer of mystery is added due to its distributed nature. Fortunately, Kubernetes provides a multitude of tools for diagnosing and debugging problems. One of the most critical and foundational aspects of Kubernetes troubleshooting is logging. This blog post will explore how to evaluate cluster and node logging to help debug Kubernetes issues effectively.

Logging in Kubernetes: A Bird's Eye View

In Kubernetes, logs provide a wealth of information about the cluster and node health, and they are vital for diagnosing problems. While Kubernetes does not offer a native log storage solution, it allows logs to be collected at different levels:

Cluster level: The control plane components of a Kubernetes cluster generate logs that provide insights about cluster-wide events.
Node level: Each node (worker or master) in the cluster generates logs about the kubelet, the Docker service, and the jobs they have handled. They offer information about the events and issues tied to specific nodes.
Pod level: Each container in a pod has logs specific to the process it is running. While not the focus of this post, these logs are also critical for debugging application-level problems.

Evaluating Cluster Level Logging

The first step to troubleshoot Kubernetes at a cluster level involves evaluating the logs from the Kubernetes control plane components. These components include the API server, scheduler, etcd, and controller manager.

Here's how you can access these logs:

If you are using a managed Kubernetes service like Google Kubernetes Engine (GKE) or Amazon EKS, your cloud provider typically provides an interface or API to access these logs.
If you are managing your Kubernetes cluster, you can usually access these logs directly from the host running the control plane components. For instance, if your control plane components run as systemd services, you can use the journalctl command to view the logs.

Evaluating Node Level Logging

The second step is to check the node-level logs, which provide insights into the workings of the kubelet and container runtime.

For node level logging, you typically need to access the host machine directly. If you have SSH access to the node, you can use tools like journalctl (for systemd-based systems) or docker logs (for Docker container runtime).

The journalctl command is part of the systemd system and service manager, and it is used to query and display messages from the journal log. In the context of a Kubernetes cluster, it's helpful to view logs on both control plane nodes and worker nodes to troubleshoot issues and understand system behavior.

Here's a brief guide to use journalctl for checking logs in control plane and/or worker nodes:

SSH into the node:

First, you need to ssh into the node (either a control plane node or worker node) where you want to check the logs.

ssh username@node-ip-address

Use journalctl to view logs:

Once you're in the node, you can use journalctl command to view the logs.

To view all logs: sh sudo journalctl
To view logs for a specific unit, such as kubelet which is the primary "node agent" that runs on each node: sh sudo journalctl -u kubelet
To view logs since a certain time: sh journalctl --since "2019-07-05 21:30:01" --until "2019-07-05 21:30:02"
To view logs in reverse order (newest first): sh sudo journalctl -r
To follow log output (similar to tail -f): sh sudo journalctl -f Remember that journalctl shows all logs by default. You might need to narrow down the logs with some filters to find out what's happening on your Kubernetes node. If you're troubleshooting a Kubernetes-specific problem, it might be more helpful to use the kubectl logs command for viewing the logs of specific pods.

In addition, while journalctl provides great facilities for viewing logs, it's not a complete log aggregation solution. For production systems, consider using centralized logging solutions (like ELK stack, Loki, or managed services like Google Cloud's operations suite (formerly Stackdriver) or AWS CloudWatch) which are capable of collecting and storing logs from all nodes and services, and providing search and visualization capabilities.

Pod Level Logging

The kubectl logs command is a very handy tool in Kubernetes to help with the debugging of your applications. This command allows you to view the logs of a specific pod, which can be useful when troubleshooting or monitoring.

To view the logs of a specific pod:
```
kubectl logs <pod-name>
```
This will print out the logs to your terminal. is the name of the pod as it appears when you run kubectl get pods.
If a pod has multiple containers, you need to specify which container's logs you want to see:
```
kubectl logs <pod-name> -c <container-name>
```
To follow a log in real time, similar to the tail -f command, you can use the -f flag:
```
kubectl logs -f <pod-name>
```
To view logs of a pod that is no longer running, perhaps due to a crash or a completed job, you can use the --previous flag:
```
kubectl logs --previous <pod-name>
```
Remember that kubectl logs only retrieves logs from the pod's standard output and standard error streams. If your application writes logs to a file and not to standard output or standard error, you won't see these logs with kubectl logs.

Logging Best Practices

While knowing how to access and evaluate logs is essential, following some best practices can help you make the most out of your logging strategy.

Centralize your logs: Given the distributed nature of Kubernetes, logs can become scattered across different nodes, making troubleshooting challenging. Use a log shipper like Fluentd or Fluent Bit to centralize your logs into a single logging backend like Elasticsearch or a cloud-based solution like Google Cloud's operations suite (formerly Stackdriver) or AWS CloudWatch.
Structured logging: Ensure your applications and systems are logging in a structured format (like JSON), making logs easier to query and analyze.
Log levels: Use appropriate logging levels (INFO, WARN, ERROR, etc.) to make it easier to filter and search logs based on severity.
Log Retention and Rotation: Be aware of how long your logs are retained, and implement a log rotation policy to prevent storage issues.

Conclusions

While Kubernetes does not offer an out-of-the-box logging solution, the flexibility it provides allows you to design a logging strategy that suits your needs. Understanding how to evaluate and leverage Kubernetes logs at the cluster and node level is a vital skill in your Kubernetes troubleshooting toolkit. As your clusters grow, so will the complexity of logging, but with the right logging strategy, you can ensure you're well-equipped to handle any issues that come your way. Happy troubleshooting!

Blog