Delay between retries for Supervisor

icanhazstring

Andreas Frömer

Posted on November 4, 2021

Delay between retries for Supervisor

Having process managing tools on current systems is a must have. The most common used, or at least the tools I have been using, are systemd and supervisor.

While both do basically the same, supervisor is lacking one crucial ability: The ability to have a delay between process restarts

While this is not always necessary, in some cases it will prevent your process to get into FATAL state which stops it altogether.

But why isn't there such an option?

The solution already exists

The problems is not that it is impossible to add such a feature, but rather that the maintainers need to merge and maintain it.

There already exists a feature request from 2014 and the solving pull request from 2015 which, to this time of writing, haven't been merged.

I have to make clear: I am not blaming the maintainers for not merging it. It is simply a matter of time from the maintainers.

How can you still add a delay

To still add a delay you need to get somewhat creative.
While searching for a solution I found the feature request for supervisor (see above), this mentions a solution to use a sleep X after your command for your supervisor process.

[program:www]
command=bash -c "<path to your script>; sleep X"
Enter fullscreen mode Exit fullscreen mode

While this might work, the problem is that supervisor can not gracefully stop this process using a SIGTERM signal. The SIGTERM will only hit your bash command, but not your actual script.

So what can you do?
While reading further into the feature request if found a pull request to symfony/messenger component, here the author added a script which actually is capable forwarding the SIGTERM signal into the child process.

You can find the PR here: https://github.com/symfony/symfony-docs/pull/13597

Extracted from this PR, the scripts looks as follows:

#!/bin/bash

# Supervisor sends TERM to services when stopped.
# This wrapper has to pass the signal to it's child.
# Note that we send TERM (graceful) instead of KILL (immediate).
_term() {
    kill -TERM "$child" 2>/dev/null
    exit 1
}

trap _term SIGTERM

# Execute console.php with whatever arguments were specified to this script
"$@" &
child=$!
wait "$child"
rc=$?

# Delay to prevent supervisor from restarting too fast on failure
sleep 30

# Return with the exit code of the wrapped process
exit $rc
Enter fullscreen mode Exit fullscreen mode

So the "only" thing you need to change is to prepend this wrapper in front to you previous command:

[program:www]
/path/to/wrapper <path to your script>
Enter fullscreen mode Exit fullscreen mode

This way every SIGTERM will passed into your script and will exit as soon as the child is terminated. If you child process will die with some exit code larger 0, it will sleep for 30 seconds before exiting. After that period, supervisor will restart it again.


Credits goes to the authors of both pull requests as they have done the work already. I am just a messenger :)

💖 💪 🙅 🚩
icanhazstring
Andreas Frömer

Posted on November 4, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Delay between retries for Supervisor
supervisor Delay between retries for Supervisor

November 4, 2021