A tentative comparison of fault tolerance libraries on the JVM

nfrankel

Nicolas Fränkel

Posted on January 9, 2022

A tentative comparison of fault tolerance libraries on the JVM

If you're implementing microservices or not, the chances are that you're calling HTTP endpoints. With HTTP calls, a lot of things can go wrong. Experienced developers plan for this and design beyond just the happy path. In general, fault tolerance encompasses the following features:

  • Retry
  • Timeout
  • Circuit Breaker
  • Fallback
  • Rate Limiter to avoid server-side 429 responses
  • Bulkhead: Rate Limiter limits the number of calls in a determined timeframe, while Bulkhead limits the number of concurrent calls

A couple of libraries implement these features on the JVM. In this post, we will look at Microprofile Fault Tolerance, Failsafe and Resilience4J.

Microprofile Fault Tolerance

Microprofile Fault Tolerance comes from the Microprofile umbrella project. It differs from the two others because it's a specification, which relies on a runtime to provide its capabilities. For example, Open Liberty is one such runtime. SmallRye Fault Tolerance is another one. In turn, other components such as Quarkus and WildFly embed SmallRye.

Microprofile defines annotations for each feature: @Timeout, @Retry Policy, @Fallback, @Circuit Breaker, and @Bulkhead. It also defines @Asynchronous.

Because the runtime reads annotations, one should carefully read the documentation to understand how they interact if more than one is set.

A @Fallback can be specified and it will be invoked if the TimeoutException is thrown. If @Timeout is used together with @Retry, the TimoutException will trigger the retry. When @Timeout is used with @CircuitBreaker and if a TimeoutException occurs, the failure will contribute towards the circuit open.

-- Timeout Usage

Resilience4J

I came upon Resilience4J when I was running my talk on the Circuit Breaker pattern. The talk included a demo, and it relied on Hystrix. One day, I wanted to update the demo to the latest Hystrix version and noticed that maintainers had deprecated it in favor of Resilience4J.

Resilience4J is based on several core concepts:

  • One JAR per fault tolerance feature, with additional JARs for specific integrations, e.g., Kotlin
  • Static factories
  • Function composition via the Decorator pattern applied to functions
  • Integration with Java's functional interfaces, e.g., Runnable, Callable, Function, etc.
  • Exception propagation: one can use a functional interface that throws, and the library will propagate it across the call pipeline

Here's a simplified class diagram for Retry.

Resilience4J Retry API

Each fault tolerance feature is built around the same template seen above. One can create a pipeline of several features by leveraging function composition, each one calling another one.

Let's analyze a sample:

var retrySupplier = Retry.decorateSupplier(                                  // 1
    Retry.ofDefaults("retry"),                                               // 2
    () -> server.call()                                                      // 1
);
var config = new CircuitBreakerConfig.Builder()                              // 3
        .slowCallDurationThreshold(Duration.ofMillis(200))                   // 4
        .slidingWindowSize(2)                                                // 5
        .minimumNumberOfCalls(2)                                             // 6
        .build();
var breakerSupplier = CircuitBreaker.of("circuit-breaker", config)           // 7
                                    .decorateSupplier(retrySupplier);        // 7
supplier = SupplierUtils.recover(                                            // 8
    breakerSupplier,
    List.of(IllegalStateException.class, CallNotPermittedException.class),   // 9
    e -> "fallback"                                                         // 10
);
Enter fullscreen mode Exit fullscreen mode
  1. Decorate the base server.call() function with Retry: this function is the one to be protected
  2. Use the default configuration
  3. Create a new Circuit Breaker config
  4. Set the threshold above which a call is considered to be slow
  5. Count over a sliding window of 2 calls
  6. Minimum number of calls to decide whether to open the Circuit Breaker
  7. Decorate the retry function with a Circuit Breaker with the above config
  8. Create a fallback value to return when the Circuit Breaker is open
  9. List of exceptions to handle: they won't be propagated. Resilience4J throws a CallNotPermittedException when the circuit is open.
  10. In case any of the configured exceptions are thrown, call this function instead

The order in which functions are composed can be hard to decipher. Hence, the project offers the Decorators class to combine functions using a fluent API. You can find it in the resilience4j-all module. One can rewrite the above code as:

var pipeline = Decorators.ofSupplier(() -> server.call())
    .withRetry(Retry.ofDefaults("retry"))
    .withCircuitBreaker(CircuitBreaker.of("circuit-breaker", config))
    .withFallback(
        List.of(IllegalStateException.class, CallNotPermittedException.class),
        e -> "fallback"
    );
Enter fullscreen mode Exit fullscreen mode

It makes the intent much clearer.

Failsafe

I stumbled upon Failsafe not long ago. Its tenets are similar to Resilience4J: static factories, function composition, and exception propagation.

While Resilience4J fault tolerance feature don't share a class hierarchy, Failsafe provides the concept of Policy:

Failsafe Retry API

I believe the main difference with Resilience4J lies in its pipelining approach. Resilience4J's API requires you first to provide the "base" function and then embed it inside any wrapper function. You cannot reuse the pipeline on top of different base functions. Failsafe allows it via the FailsafeExecutor class.

Failsafe API

Here's how to create a pipeline, i.e., an instance of FailsafeExecutor.
Notice there's no reference to the base call:

var pipeline = Failsafe.with(                            // 1
    Fallback.of("fallback"),                             // 2
    Timeout.ofDuration(Duration.of(2000, MILLIS)),       // 3
    RetryPolicy.ofDefault()                              // 4
);
Enter fullscreen mode Exit fullscreen mode
  1. Define the list of policies applied from the last to the first in order
  2. Fallback value
  3. If the call exceeds 2000ms, throws a TimeoutExceededException
  4. Default retry policy

At this point, it's possible to wrap the call:

pipeline.get(() -> server.call());
Enter fullscreen mode Exit fullscreen mode

Failsafe also provides a fluent API. One can rewrite the above code as:

var pipeline = Failsafe.with(Fallback.of("fallback"))
    .compose(RetryPolicy.ofDefault())
    .compose(Timeout.ofDuration(Duration.of(2000, MILLIS)));
Enter fullscreen mode Exit fullscreen mode

Conclusion

All three libraries provide more or less the same features. If you don't use a CDI-compliant runtime such like regular application server or Quarkus, forget about Microprofile Fault Tolerance.

Failsafe and Resilience4J are both based on function composition and are pretty similar. If you need to define your function pipeline independently of the base call, prefer Failsafe. Otherwise, pick any of them.

As I'm more familiar with Resilience4J, I'll probably use Failsafe in my next project to get more experience with it.

To go further:

Originally published at A Java Geek on January 7th, 2022

💖 💪 🙅 🚩
nfrankel
Nicolas Fränkel

Posted on January 9, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related