How to deal with stuck ECS deployment

killdozerx2

Himanshu Pant

Posted on December 20, 2023

How to deal with stuck ECS deployment

One of the hardest problems that I’ve ever dealt with in my life is making ECS deployments with Cloudformation work and work nicely, which is hard for some reason that only AWS knows.

TLDR

If your cloudformation stack update containing ECS doesn’t finish in a reasonable amount of time(depends on previous deployment times and container image size), roughly 10-15 minutes. You have a problem with your service.

Why does it stay stuck?

Idk, it just does, this is one of those things that cloudformation(any deployment tool) really sucks at and CloudFormation waits for several hours before rolling back to a previous configuration. If the issue that's causing stack failure continues during stack rollback to a previous configuration, then the stack gets stuck in UPDATE_ROLLBACK_IN_PROGRESS status. Finally, the stack changes to UPDATE_ROLLBACK_FAILED status and that is DevOps hell.

What’s really happening though?

ECS ***********really*********** likes to be stable and if it can’t be stable then it won’t finish updating(it won’t tell cloudformation that the update finished) and it won’t rollback either, so the main goal is to get the service to reach stability and there can be many reasons for it not reaching stability, however that is beyond the scope of this article.

So what could’ve gone wrong?

Here are some common reasons why an Amazon ECS service can fail to launch new tasks:

  • Container image issues
  • A lack of necessary resources for launching tasks
  • A health check failure on a load balancer
  • Instance configuration or Amazon ECS container agent issues

An Amazon ECS service that fails to launch tasks causes AWS CloudFormation to get stuck in UPDATE_IN_PROGRESS status, and you can quick check this by going into the service and selecting deployments, and checking the status of the latest deployment.

So what do I even do now?

Delete the stack and recreate!

!https://media0.giphy.com/media/0bc1ObnZK6IO0FkMV8/giphy.gif?cid=7941fdc64gt9sxd9juwjb3gft1b2qq3zgnmk2oxdassm15n0&ep=v1_gifs_search&rid=giphy.gif&ct=g

The main goal is to get your cloudformation to say:- “Update Complete” and it can take the AWS CloudFormation stack several hours to stabilize. To stabilize your stack more quickly, just lie to it.

Disclaimer

The following resolution is intended to help you stabilize an AWS CloudFormation stack quickly without waiting for the stack to time out. The resolution isn't intended for production environments, as the Amazon ECS service is out of sync with the known state of AWS CloudFormation. To sync resources between your Amazon ECS service and the AWS CloudFormation stack, you must perform an error-free update on the stack.

Resolution

Change the desired task count of the Amazon ECS service to 0

  1. Open the Amazon ECS console.
  2. Choose your cluster.
  3. Select the service, and then choose Update.
  4. Set Number of tasks to 0, and then save the configuration.

So what do I then?

Get your service to run and stabilize, make sure the process can start.

💖 💪 🙅 🚩
killdozerx2
Himanshu Pant

Posted on December 20, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related