Extending the shutdown timeout setting to ensure graceful IHostedService shutdown

andrewlocknet

Andrew Lock "Sock"

Posted on May 31, 2020

Extending the shutdown timeout setting to ensure graceful IHostedService shutdown

Extending the shutdown timeout setting to ensure graceful IHostedService shutdown

I was seeing an issue recently where our application wasn't running the StopAsync method in our IHostedService implementations when the app was shutting down. It turns out that this was due to some services taking too long to respond to the shutdown signal. In this post I show an example of the problem, discuss why it happens, and how to avoid it.

Running background services with IHostedService

ASP.NET Core 2.0 introduced the IHostedService interface for running background tasks. The interface consists of two methods:

public interface IHostedService
{
    Task StartAsync(CancellationToken cancellationToken);
    Task StopAsync(CancellationToken cancellationToken);
}

StartAsync is called when the application is starting up. In ASP.NET Core 2.x this occurs just after the application starts handling requests, while in ASP.NET Core 3.x the hosted services are started just before the application starts handling requests.

StopAsync is called when the application receives the shut down (SIGTERM) signal, for example when you push CTRL+C in the console window, or the app is stopped by the host system. This allows you to close any open connections, dispose resources, and generally clean up your class as required.

In practice, there are actually some subtleties to implementing this interface that means that you typically want to derive from the helper class BackgroundService.

Steve Gordon has a course on Pluralsight "Building ASP.NET Core Hosted Services and .NET Core Worker Services" if you want to learn more.

Problems shutting down an IHostedService implementation

The problem I saw recently was causing an OperationCanceledException to be thrown when the application was shutting down:

Unhandled exception. System.OperationCanceledException: The operation was canceled.
   at System.Threading.CancellationToken.ThrowOperationCanceledException()
   at Microsoft.Extensions.Hosting.Internal.Host.StopAsync(CancellationToken cancellationToken)

I traced the source of this problem to one particular IHostedService implementation. We use IHostedServices as the host for each of our Kafka consumers. The specifics of this aren't important - the key is just that shutting down the IHostedService is relatively slow: it can take several seconds to cancel the subscription.

Part of the problem is the way the Kafka library (and underlying librdkafka library) uses synchronous, blocking Consume calls, instead of async, cancellable calls. There's not a great way around that.

The easiest way to understand this issue is with an example.

Demonstrating the problem

The easiest way to understand the problem is to create an application containing two IHostedService implementations:

  • NormalHostedService logs when it starts up and shuts down, then returns immediately.
  • SlowHostedService logs when it starts and stops, but takes 10s to complete shutdown

The implementations for these two classes are shown below. The NormalHostedService is very simple:

public class NormalHostedService : IHostedService
{
    readonly ILogger<NormalHostedService> _logger;

    public NormalHostedService(ILogger<NormalHostedService> logger)
    {
        _logger = logger;
    }

    public Task StartAsync(CancellationToken cancellationToken)
    {
        _logger.LogInformation("NormalHostedService started");
        return Task.CompletedTask;
    }

    public Task StopAsync(CancellationToken cancellationToken)
    {
        _logger.LogInformation("NormalHostedService stopped");
        return Task.CompletedTask;
    }
}

The SlowHostedService is almost identical, but it has a Task.Delay that takes 10s to simulate a slow shutdown

public class SlowHostedService : IHostedService
{
    readonly ILogger<SlowHostedService> _logger;

    public SlowHostedService(ILogger<SlowHostedService> logger)
    {
        _logger = logger;
    }

    public Task StartAsync(CancellationToken cancellationToken)
    {
        _logger.LogInformation("SlowHostedService started");
        return Task.CompletedTask;
    }

    public async Task StopAsync(CancellationToken cancellationToken)
    {
        _logger.LogInformation("SlowHostedService stopping...");
        await Task.Delay(10_000);
        _logger.LogInformation("SlowHostedService stopped");
    }
}

The IHostedServices I had in practice only took 1s to shutdown, but we had many of them, so the overall effect was the same as above!

The order the services are registered in ConfigureServices is important in this case - to demonstrate the issue, we need SlowHostedService to be shut down first. Services are shut down in reverse order, which means we need to register it last:

public void ConfigureServices(IServiceCollection services)
{
    services.AddHostedService<NormalHostedService>();
    services.AddHostedService<SlowHostedService>();
}

When we run the application, you'll see the starting logs as usual:

info: ExampleApp.NormalHostedService[0]
      NormalHostedService started
info: ExampleApp.SlowHostedService[0]
      SlowHostedService started
...
info: Microsoft.Hosting.Lifetime[0]
      Application started. Press Ctrl+C to shut down.

However, if you press CTRL+C to shut down the application, there's a problem. The SlowHostedService completes shutting down, but then an OperationCanceledException is thrown:

info: Microsoft.Hosting.Lifetime[0]
      Application is shutting down...
info: ExampleApp.SlowHostedService[0]
      SlowHostedService stopping...
info: ExampleApp.SlowHostedService[0]
      SlowHostedService stopped

Unhandled exception. System.OperationCanceledException: The operation was canceled.
   at System.Threading.CancellationToken.ThrowOperationCanceledException()
   at Microsoft.Extensions.Hosting.Internal.Host.StopAsync(CancellationToken cancellationToken)
   at Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.WaitForShutdownAsync(IHost host, CancellationToken token)
   at Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.RunAsync(IHost host, CancellationToken token)
   at Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.RunAsync(IHost host, CancellationToken token)
   at Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.Run(IHost host)
   at ExampleApp.Program.Main(String[] args) in C:\repos\andrewlock\blog-examples\SlowShutdown\Program.cs:line 16

The NormalHostedService.StopAsync() method is never called. If the service needed to do some cleanup then you have a problem. For example, maybe you need to gracefully deregister the service from Consul, or unsubscribe from Kafka topics - that won't happen now.

So what's going on here? Where's that timeout coming from?

The reason: HostOptions.ShutDownTimeout

You can find the problematic code in the framework Host implementation that runs on application shutdown. A simplified version is shown below:

internal class Host: IHost, IAsyncDisposable
{
    private readonly HostOptions _options;
    private IEnumerable<IHostedService> _hostedServices;

    public async Task StopAsync(CancellationToken cancellationToken = default)
    {
        // Create a cancellation token source that fires after ShutdownTimeout seconds
        using (var cts = new CancellationTokenSource(_options.ShutdownTimeout))
        using (var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(cts.Token, cancellationToken))
        {
            // Create a token, which is cancelled if the timer expires
            var token = linkedCts.Token;

            // Run StopAsync on each registered hosted service
            foreach (var hostedService in _hostedServices.Reverse())
            {
                // stop calling StopAsync if timer expires
                token.ThrowIfCancellationRequested();
                try
                {
                    await hostedService.StopAsync(token).ConfigureAwait(false);
                }
                catch (Exception ex)
                {
                    exceptions.Add(ex);
                }
            }
        }

        // .. other stopping code
    }
}

The key point here is the CancellationTokenSource that is configured to fire after HostOptions.ShutdownTimeout. By default, this fires after 5 seconds. That means hosted service shutdown is abandoned after 5s - shut down of all IHostedServices has to happen within this timeout.

public class HostOptions
{
    public TimeSpan ShutdownTimeout { get; set; } = TimeSpan.FromSeconds(5);
}

On the first iteration of the foreach loop, the SlowHostedService.Stopasync() executes, which takes 10s to run. On the second iteration, the 5s timeout is exceeded, and so token.ThrowIfCancellationRequested(); throws an OperationConcelledException. That exits the control flow, and NormalHostedService.Stopasync() is never executed.

There's a simple solution to this - increase the shutdown timeout!

The solution: increase the shutdown timeout

HostOptions isn't explicitly configured anywhere by default, so you will need to configure it manually in your ConfigureSerices method. For example, the following config increases the timeout to 15s:

public void ConfigureServices(IServiceCollection services)
{
    services.AddHostedService<NormalHostedService>();
    services.AddHostedService<SlowShutdownHostedService>();

    // Configure the shutdown to 15s
    services.Configure<HostOptions>(
        opts => opts.ShutdownTimeout = TimeSpan.FromSeconds(15));
}

Alternatively, you can also load the timeout from configuration. For example, if you add the following to appsettings.json:

{
    "HostOptions": {
        "ShutdownTimeout": "00:00:15"
    }
    // other config
}

You can then bind the HostOptions configuration section to the HostOptions object:

public class Startup
{
    public IConfiguration Configuration { get; }
    public Startup(IConfiguration configuration)
    {
        Configuration = configuration;
    }

    public void ConfigureServices(IServiceCollection services)
    {
        services.AddHostedService<NormalHostedService>();
        services.AddHostedService<SlowShutdownHostedService>();

        // bind the config to host options
        services.Configure<HostOptions>(Configuration.GetSection("HostOptions"));
    }
}

This binds the serialised TimeSpan value 00:00:15 to the HostOptions value and sets the timeout to 15s. With that configuration, now when we stop the application, the services all shutdown correctly:

info: Microsoft.Hosting.Lifetime[0]
      Application is shutting down...
info: SlowShutdown.SlowShutdownHostedService[0]
      SlowShutdownHostedService stopping...
info: SlowShutdown.SlowShutdownHostedService[0]
      SlowShutdownHostedService stopped
info: SlowShutdown.NormalHostedService[0]
      NormalHostedService stopped

Your application will now wait up to 15s for all the hosted services to finish shutting down before exiting!

Summary

In this post I discussed an issue recently where our application wasn't running the StopAsync method in our IHostedService implementations when the app was shutting down. This was due to some background services taking too long to respond to the shutdown signal, and exceeding the shutdown timeout. I demonstrated the problem with a single service taking 10s to shutdown, but in practice it happens whenever the total shutdown time for all services exceeds the default 5s.

The solution to the problem was to extend the HostOptions.ShutdownTimeout configuration value to be longer than 5s, using the standard ASP.NET Core IOptions<T> configuration system.

💖 💪 🙅 🚩
andrewlocknet
Andrew Lock "Sock"

Posted on May 31, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related