Resilient microservices with Quarkus and Kotlin - Design Patterns
Felicia Faye
Posted on July 5, 2022
In the first part of the series, we took a closer look on why it is a good strategy to choose Quarkus in combination with Kotlin. Do you also remember the snake scenario? We wanted to create more resilient microservices then expected by default - just to make sure that we're save most of the time.
A good starting point for the resilience aspects are resilience design patterns. Most of them have the intention to reduce down times and speed up start up times, so that for example containers running in a cluster have a good uptime and can recover rapidly ⚡︎.
Resilience design patterns can be grouped into four main sections:
- Isolation
- Loose Coupling
- Supervision
- Latency Control
Isolation patterns are focused on stuff like parameter checking, bulkheads and shed loads. Loose Coupling stands for async communication, idempotency and an event-driven architecture. Supervision patterns deal with monitoring, escalation and error handling, while latency control is about retry, fallback and timeout mechanisms including patterns like circuit breakers and fan in/out strategies.
We are mainly going to have a closer look to the latency control patterns, but also check out some of the other sections, which help us to build services more resilient.
First, there is the retry pattern, where we would simply retry what failed before. In case of our snake scenario imagine a snake detector service, which needs to call a push notification service, which isn't very reliable.
As a solution we could just retry until we succeed.
This strategy is suitable for issues like:
- temporary network issues and packet loss
- internal server errors of the target system
- no or slow response times of the target system
But wait .. what if the target system is already overloaded? Sometimes, it can be a bad idea to just increase the number of calls to ensure everything is working and even then, it is still not guaranteed.
So, let's better check another option: the fallback pattern, which uses another alternative option in case of a failure. In our case as we rely on an external push notification service.
We can just call an alternative service, which is maybe more expensive, but also more reliable in terms of service. Sometimes it also valid to have no fallback as fallback. E.g., when a shopping order service should do an external fraud check, but the fraud check is not that important, that the shop should reject orders for at least smaller amounts of money to ensure a great shopping experience. But in our case, we must ensure that we successfully call a push notification service.
Then, there is also the timeout pattern. This is quite well known from the HTTP world - especially used by HTTPS. It describes the process of waiting for a certain time to get a successful answer, but then give up actively. In scenarios where we cannot be sure that the call made it through or when we deal with slow servers it can be very helpful to configure and finetune the communication timeout values on a service level.
However, finding the right value can be difficult: we want the timeout to be long enough for slower responses, but giving up actively after a certain while. Tracking failure statistics can help to finetune the proper timeout values.
Circuit breakers are another pattern from the latency control family. They are inspired by the electrical breakers and do quite the same: they can sit between two components and control the throughput by being either open, half-open or closed.
The open state means that the circuit breaker itself is open and active and therefor all requests are blocked and cannot pass like in the illustration below.
A circuit breaker is half-open, when only a certain number of requests are allowed to pass.
When the circuit breaker is closed, all requests go through without any restrictions.
This concept can help to protect systems from shutting down due to overload. It is especially helpful in combination with the retry, fallback and timeout patterns.
Another very helpful pattern is called fan in / fan out and it is also a good example to demonstrate why we want loose coupling. Without loose coupling it would be complicated to apply fan in / fan out, because we need independent components for this: the pattern is about scaling workers or pods for producing and consuming independently. So, in this case fan in means that if producers are too slow and consumers are fast, the producers should be scaled up and maybe the consumers should also be slowed down. Fan out means that when producers are fast and consumers are slow, we want to scale up our consumers and additionally maybe have less producers. This way a system can handle different system loads for example at different day times by scaling workers up and down.
Now, after we've seen some helpful design patterns in theory, it's time to evaluate how we can realize them with Quarkus and Kotlin to build microservices more robust: we'll see that in part 3.
Posted on July 5, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.