Why Redis Cache times out in Azure Function App on Consumption Plan? - A Journey
Asad Raheem
Posted on July 14, 2020
I decided to move a power-user feature to an Azure Function App. Redis Cache was extensively being used. In a controlled environment, it resulted in better scalability and performance.
The problem?
Redis time-out exceptions were being thrown on production. Always? No. Sometimes? Yes and that was even a bigger problem as it was difficult to trace the root cause.
I was following the approach mentioned in Microsoft documentation.
private static Lazy<ConnectionMultiplexer> lazyConnection = new Lazy<ConnectionMultiplexer>(() =>
{
string cacheConnection = ConfigurationManager.AppSettings["CacheConnection"].ToString();
return ConnectionMultiplexer.Connect(cacheConnection);
});
public static ConnectionMultiplexer Connection
{
get
{
return lazyConnection.Value;
}
}
First Hunch
Redis Server Load might have exceeded the plan. To my surprise, that was not the case. Redis was hardly ever exceeding 10% server load.
Second Hunch
Redis server is single-threaded. Object size might be too large in the cache.
Avoid using certain Redis commands that take a long time to complete, unless you fully understand the impact of these commands. For example, do not run the KEYS command in production. Depending on the number of keys, it could take a long time to return. Redis is a single-threaded server and it processes commands one at a time. If you have other commands issued after KEYS, they will not be processed until Redis processes the KEYS command.
That was also not the case.
Third Hunch
Another feature synchronously accessing Redis for a large object might be causing this issue but it just didn't make sense. Such features weren't being frequently used.
Fourth Hunch
Noisy neighbors. Azure Redis Cache Standard Tier C0 plan was being used. It turns out C0 plans aren't meant for production use.
The Basic tier is a single node system with no data replication and no SLA. Also, use at least a C1 cache. C0 caches are meant for simple dev/test scenarios since they have a shared CPU core, little memory, and are prone to "noisy neighbor" issues.
Upgraded the plan and waited patiently. The issue still didn't resolve.
Time for Experimentation
Made a testing gear for generating a large number of asynchronous requests to access Redis Cache using the same lazy initialization pattern.
Viola! The much-awaited timeout finally occurred on my local system. It was occurring when multiple threads were trying to access the cache. Due to the lazy loading pattern mentioned above, the cache connection was asynchronously tried to be initiated by every request. According to the documentation:
The Lazy instance is not thread safe; if the instance is accessed from multiple threads, its behavior is undefined. Use this mode only when high performance is crucial and the Lazy instance is guaranteed never to be initialized from more than one thread. If you use a Lazy constructor that specifies an initialization method (valueFactory parameter), and if that initialization method throws an exception (or fails to handle an exception) the first time you call the Value property, then the exception is cached and thrown again on subsequent calls to the Value property.
But how was this occurring on production? The answer, consumption plan.
The function app is not always running on the consumption plan. The Redis connection was being initialized whenever the function was triggered by an Azure Storage Queue message. The problem was occurring if the function app received a burst of messages either when it wasn't already running or it was scaling out.
Solution
Pass a LazyThreadSafetyMode
mode in the constructor. Yes, that's it. Other than None
, there are two options PublicationOnly
or ExecutionAndPublication
. For my use-case, I needed PublicationOnly
as stated in the documentation:
When multiple threads try to initialize a Lazy instance simultaneously, all threads are allowed to run the initialization method (or the parameterless constructor, if there is no initialization method). The first thread to complete initialization sets the value of the Lazy instance. That value is returned to any other threads that were simultaneously running the initialization method, unless the initialization method throws exceptions on those threads. Any instances of T that were created by the competing threads are discarded.
private static Lazy<ConnectionMultiplexer> lazyConnection = new Lazy<ConnectionMultiplexer>(() =>
{
string cacheConnection = ConfigurationManager.AppSettings["CacheConnection"].ToString();
return ConnectionMultiplexer.Connect(cacheConnection);
}, LazyThreadSafetyMode.PublicationOnly);
public static ConnectionMultiplexer Connection
{
get
{
return lazyConnection.Value;
}
}
The fix itself was simple but figuring out the exact conditions on the production environment was difficult.
Note: In the above code snippets, ConfigurationManager
is being used to access App Settings. I wrote that here to stay consistent with the documentation. Since Azure Function App v2, Environment.GetEnvironmentVariable
should be used.
I hope this helps.
Posted on July 14, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
July 14, 2020