Kubernetes 101, part III, controllers and self-healing

leandronsp

Leandro Proença

Posted on March 12, 2023

Kubernetes 101, part III, controllers and self-healing

The second part of this series explained how Pods work while building a Pod having two containers communicating to each other using FIFO and a shared volume.

In this post we'll learn about self-healing systems and what we can achieve by leveraging Pod management to Kubernetes workload resources so they can manage Pods on our behalf.

🚂 Accidents will happen

Let's say we have a single healthy node and multiple Pods running in it. What if the node is faced with a critical hardware failure, making it unhealthy? Remember: a Kubernetes node is represented by a virtual machine.

a node failure

Since they have a lifecycle, Pods in an unhealthy node will begin to fail.

⬇️ Application downtime

A new node is required. But provisioning hardware is a costly operation, it takes time.

Meanwhile, the Pod remains failed in the unhealthy node and the application is suffering a downtime.

Once the new node has joined and is ready to accept new Pods, we can start all the pods manually using kubectl in the newly healthy node, for instance:



$ kubectl apply -f ./pod.yml


Enter fullscreen mode Exit fullscreen mode

starting pods in the new node
Until the Pod is ready and running, the application remains out of service, for example:

application downtime

Managing Pods directly is not efficient, it can be a cumbersome task not to mention that our application would face multiple downtimes.

We should build a system which is capable of detecting failures and also restarting components or applications automatically with no human intervention.

We need a self-healing system.

🤖 Self-healing and Robotics

Building a self-healing system is crucial for businesses. Anytime our infrastructure suffer disruption, networking or hardware failure, the system should be capable of "healing itself".

Automation is key. And a potential solution for self-healing comes from Robotics.

In Robotics, we usually create a controller that gets a desired state and, by using some sort of control loop, it continuously check if the current state matches the desired state, trying to come closer as much as possible.

controller and control loop

A thermostat works exactly using such a controller pattern: it continuously checks if the current temperature matches the desired one, trying to come closer. Once it gets a match, the controller turns off the equipment and the process is repeated over and over again.

Luckily, Kubernetes brings the controller pattern that solves our problem so that we don't need to manage Pods directly.

We are talking about Kubernetes Controllers.

Controllers

Kubernetes controllers are control loops that watch the cluster state, then take actions to match the desired state as much as possible.

But how do we make use of controllers? Kubernetes provides several Workload Resources so we can rely on them to manage Pods on our behalf.

Time to explore one of the main workload resources that guarantees self-healing capabilities, the ReplicaSet.


ReplicaSet

Using a ReplicaSet controller, we can specify a number of identical Pods.



### The kind of the Kubernetes object
kind: ReplicaSet
apiVersion: apps/v1
metadata:
  name: nginx
spec:
  ### The number of replicas of nginx Pod
  ### The controller will manage the Pods on our behalf
  ### Anytime a Pod goes down, the controller will restart a new one to guarantee that at least 2 nginx Pods are running
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx


Enter fullscreen mode Exit fullscreen mode

After applying the YAML file, we should have a representation of a replicaset object as follows:



$ kubectl get replicasets
NAME    DESIRED   CURRENT   READY   AGE
nginx   2         2         2       13m


Enter fullscreen mode Exit fullscreen mode

Also, checking the Pods:



$ kubectl get pods
NAME          READY   STATUS    RESTARTS   AGE
nginx-r5kmn   1/1     Running   0          15m
nginx-k87fz   1/1     Running   0          15m


Enter fullscreen mode Exit fullscreen mode

Note that each Pod has got a random identifier in the cluster as a suffix <podLabelMatcher>-<uniqueID>.

Moreover, we can describe the ReplicaSet in a picture:

A ReplicaSet controller

In the above picture, it's important to note that the controller may decide to keep each Pod in a different Node. That's exactly the resilience and self-healing capability we want!

Whenever a Node gets unhealthy, we're still keeping a healthy Node, thus our application wouldn't suffer downtime easily.

Deleting a Pod of a ReplicaSet

In case we delete a Pod that was created by a ReplicaSet, the controller will start a new one automatically:



$ kubectl delete pod nginx-r5kmn
pod nginx-r5kmn deleted


Enter fullscreen mode Exit fullscreen mode

Checking Pods again:



$ kubectl get pods
NAME          READY   STATUS    RESTARTS   AGE
nginx-k87fz   1/1     Running   0          29m

### The new Pod
nginx-mr2rd   1/1     Running   0          28s


Enter fullscreen mode Exit fullscreen mode

Deleting a ReplicaSet

But in case we want to delete all Pods of a ReplicaSet, we should delete the replicaset instead:



$ kubectl delete replicaset nginx
replicaset.apps "nginx" deleted


Enter fullscreen mode Exit fullscreen mode

And the Pods are finally gone:



$ kubectl get pods
No resources found in default namespace.


Enter fullscreen mode Exit fullscreen mode

Wrapping Up

In this post we've seen how network or hardware failures can make an impact on our application, hence the importance of a self-healing system.

On top of that, we learned about Kubernetes controllers and how they solve the self-healing problem, by introducing one of the most important workload resources in Kubernetes: the ReplicaSet.

The upcoming posts will still focus on workload resources, more precisely about how we can perform rollout deployments, define stateful Pods, single-node Pods and Pods that run a single task and then stop (Jobs).

Cheers!

💖 💪 🙅 🚩
leandronsp
Leandro Proença

Posted on March 12, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related