[EKS] Pods stuck in Init/ContainerCreating state

hazmei

Hazmei

Posted on December 19, 2023

[EKS] Pods stuck in Init/ContainerCreating state

What is EKS?

EKS aka Elastic Kubernetes Service is a managed kubernetes service offered by AWS. AWS helps to manage the control plane of the kubernetes cluster while you manage the data plane. Here at Ascenda Loyalty, we have been running our applications on EKS for more than a year.

Some background info.

Recently we have been observing a couple of pods stuck in ContainerCreating state for more than 10 minutes. For context, we are using security group for pods for the application pods and m5a.xlarge EC2 instance.

If you are familiar with EKS and security group for pods, this is only supported by most nitro based Amazon EC2 instance families and has a lower limit of max pods (if all the pods uses security group).

What happened?

Recently we increased the pods replicas started seeing more frequent deployment failure due to pods staying in Init/ContainerCreating state for a long, long time (sometimes beyond 10 minutes).

kubectl all pods output

So... What gives?

From the initial look, it seems that the pods are not getting any private IPv4 address from the controller. This causes it to stay in the Init/ContainerCreating state until it gets assigned with one. We can rule out that this is due to a scheduling issue as the pods managed to get scheduled on the nodes.

The first thing that comes to mind is to check the available private IPv4 address in the subnet if we've exhausted the whole ip range allocated. This is not the case so let's move on.

The other thing that comes to mind is that we ran out of branch network interface (pod eni) in the affected worker nodes so off we go and run the following commands:

  • kubectl get pods -A -o wide kubectl affected node Check which pods is affected and which node its scheduled on.
  • kubectl describe -n pods kubectl describe pod Check the status of the pod. If it's due to pod eni hitting the limitation, it will show up in the status.
  • kubectl describe nodes kubectl describe node Check the allocated resources for vpc.amazonaws.com/pod-eni. We know that with m5a.xlarge instance, the max pod eni is 18 per instance.

It doesn't seem that we've maxed our usage for branch-eni. 🤔

Let's dig a little further elsewhere since this is related to the pod not getting any ip address. One thing that came to mind is the AWS CNI that we use. The version used at that time was version 1.7.10. There might be a bug in the version that we've deployed that cause these random failure.

A quick google search brought us here. Most of the solution points to upgrading the AWS CNI to version ≥ v1.7.7 (which we're already on). There were also other comments stating that certain environment variables needed to be set to use security group for pods (which we did correctly). AWS CNI has newer released at that time with latest being v1.9.0 and with no options left, we upgraded to the latest CNI version.

Everything seems fine for a few hours until the same error returns to haunt us.

Enraged panda

Fast forward

After opening up AWS support ticket and going back and forth with the AWS engineer, we found that it was indeed due to the max pod eni. Our usage of the security group for pods were ultimately causing this error failed to assign an IP address to container.

facepalm

Although there are shortfall in using security group for pods in EKS (lesser number of pods per nodes), we're still using it to maintain the high level of security between different AWS resources such as RDS and Elastic MemCache.

Why didn’t we notice that we ran out of pod eni in the first place?

For each application, we deploy a kubernetes job that runs a db migration step before deploying a set of webapp and worker pods. These consumes pod eni as they are using security group per pod.

When we first check if we’re hitting the limit of pod eni, we execute these commands:

  • kubectl get pods -A -o wide
  • kubectl describe -n <namespace> pods <pod name>
  • kubectl describe nodes <node name>

Upon further inspection of the output from kubectl describe nodes , there’s a discrepancy between the reported allocated resource for vpc.amazonaws.com/pod-eni and the number of pods that uses the pod eni. We can verify this with the following command: kubectl get pods -o wide -A | grep <node name> and count the number of pods that uses security group for pods. The discrepancy can be somewhere between 1 - 6 pods as reported on the describe node commands.

What’s causing these discrepancy?

It’s the db migration jobs. These uses kubernetes job and the security group for pods. On completion, the pod eni allocated does not get detached and this does not get reflected properly in the output of kubectl describe node <node name>. That command only reports running pods and does not include completed pods.

What now?

These are some of the possible solutions:

  1. Specify the .spec.ttlSecondsAfterFinished in the job manifest.
    Not possible at the moment for us. This feature is currently in alpha stage on Kubernetes v1.19. EKS does not enable features pre-beta.

  2. Set the CI/CD system to delete the kubernetes job after it successfully completed.
    This is the suitable solution for us. We can remove the successful job since it doesn’t serve any purpose keeping it around and consuming 1 pod eni per pod.

  3. Run the db migration job as part of the webapp initcontainer.
    We would be freeing up one pod eni per application since it’ll be running in the same pod. However, this requires a bit of work on our CI/CD, helm charts and we would have a bit of uncertainty on the impact to the system.


Posted in 2020.

💖 💪 🙅 🚩
hazmei
Hazmei

Posted on December 19, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related