Exponential backoff for AWS Lambda
Sohaib Tariq
Posted on September 5, 2020
I recently set up a Lambda function that reads data from an SQS Queue
and makes an API call to one of our microservices.
Naturally, this calls for an error handling mechanism, considering that the microservice
could be down or unresponsive.
AWS Lambda provides its own retry mechanism where a message is picked up from the queue by the Lambda
consumer and becomes invisible to other consumers for a specific duration called the visibility timeout.
If the consumer completes execution successfully, it automatically deletes the message from the queue.
In case of unsuccesful execution (such as a Runtime Exception), the approximate receive count
of the message is incremented and it becomes available to other consumers after the visibility timeout passes.
The number of times a message can be re-read from the queue
before it is finally sent to a Dead Letter Queue(DLQ) is configured in the Redrive policy of the
SQS Queue and is tracked via the approximate receive count.
This retry mechanism was not exactly what I had in mind for our use case. I was thinking along the
lines of a backoff strategy that keeps retrying the API call with exponentially increasing wait time;
finally sending the message to a DLQ after a set number of retries. This would give us ample time to
fix any issues with our miscroservice and prevent it from being bombarded with failing API calls.
This is what I ended up with:
First, a very basic Java function to calculate the exponential wait time, given the number of
retries recvCount:
int randomInt = rand.nextInt(60);
Long result = new Double(Math.pow(2, recvCount)).longValue() + 30 +randomInt; //adding jitter to new random visibility timeout
Notice the addition of randomInt. That is 'jitter'. A bit of randomness. I read about it in some
documentation by Google Cloud and included
it as a good practice.
Next up, set the visibility timeout of the message to the value that we just calculated above. The maximum value allowed by AWS is 43200 seconds
or 12 hours.
sqs.changeMessageVisibility(queueUrl, msg.getReceiptHandle(), newVisibilityTimeout.intValue());
Finally, we check the response to our API call. If it is a 400 or 500 series response, we throw a Runtime Exception and change the visibility timeout of the
message. This is
the easiest way I could come up with to signal unsuccessful execution of the Lambda function. Plus, we can only throw unchecked exceptions
in our handler method.
...
// api call
...
if (response.getStatusLine().getStatusCode() >= 400){
new ExponentialBackoff().setVisibilityTimeout(msg);
throw new RuntimeException("Request to server failed");
}
ExponentialBackoff is my utility class where the code that calculates and sets the visibility timeout lives. It also has some other
utility functions that are not essential for this demonstration.
There you have it; A bare bones exponential backoff implementation for AWS Lambda.
Posted on September 5, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.