Visibility into Failing Kubernetes Pods
Joseph D. Marhee
Posted on December 15, 2018
A status of CrashLoopBackoff usually means your Pod would be created, the containers pulled, but something is causing the container process to exit and cause the container to fail, begin a backoff timer, and restart. Access to a container is sometimes as simple as exec-ing the pod container in question:
kubectl exec -ti \<pod name\> -c \<container\> -- /bin/sh
but an alternative method if the pod container does not support this (it happens), you can do two things:
- Override the entrypoint, and start the container with a new command to allow you to enter and debug (useful if you suspect the entrypoint command is failing because of something like network connectivity or DNS resolution — you can login and ping and netcat and dig whatever you want), or…
- Run a sidecar that runs a shell and anything in the main container is available over the localhost network.
This first method is fairly straightforward, you’re just declaring in the container field a new subset for the spec, command :
app: web
spec:
containers:
- name: web-app
image: bizco/neat-frontend:latest
ports:
- containerPort: 8081
command:
- "/bin/sh"
- "-c"
- "sleep 36000"
the sleep command will keep this command active, and the container running, and you can log right into the desired image, and effectively overriding the original entrypoint command for the image.
In the second case, this can be where the software isn’t failing outright, but you suspect theres an issue reaching the service, or resolving something for the pod, more broadly, or even if the image does not support a command field, so exec isn’t going to do it for you and leave the service running in the running container, you can add a new container as a sidecar to allow you that access to a shell to debug:
app: web
spec:
containers:
- name: web-app
...
- name: debug-shell
image: alpine:3.6
command:
- "/bin/sh"
- "-c"
- "sleep 36000"
and then everything running in the pod, with a port in use will be available over localhost. An example might be if, for example, you have a pod running a database like MongoDB, and the service hasn’t failed, but seems to hit snags on certain operations, so you suspect an issue pulling down new data to feed to MongoDB configured like this:
containers:
- name: mongodb
image: mongo
ports:
- containerPort: 27017
If you add that second container, you can reach MongoDB, rather than exposing via Service in KubeDNS (i.e mongodb.whatevernamespace.cluster.local ), you can just hit localhost:27017 inside that Pod as you would with any other service running in that Pod.
Posted on December 15, 2018
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.