Caching Problems You Should Know

Caching is one of the most powerful tools to improve application performance and scalability, reduces back-end load, decreases latency, better user experience. However, it also comes up with challenges.

As a developer, I’ve relied on caching to optimize application performance, one of my frequently used approach is Cache-Aside strategy - it’s simple, works well for most scenarios. Along the way, I’ve encountered issues like Thundering herd problem, Cache Stampede, and several others, which make me to rethink how I handling caching.

In this post, I’ll share my personal experiences with these problems, what went wrong, and how to resolved them.

How Cache-Aside Strategy Works

Basically, the application will determine whether the item is currently held in the cache. Then if item is not available, it will read the item from the database (MySQL). After that, it will store a copy of the item in the cache.

What went wrong

One of my projects involved a high-traffic API for delivering editorial content. We used cache-aside strategy to store the content within a 5-minute expiration. It worked fine until it didn’t.

The popular contents expired at the same time
The back-end is flooded with requests, struggling to handle the load.
A huge number of requests reading the same key which had expired. Then, all of them directly read data from database, it just simply couldn’t keep up.

Solutions

After digging into the issue, it became clear that the back-end had scaled up to its maximum allowed capacity, while the database was overwhelmed by the surge in requests. This was leading to degraded performance or even system crashes.

Thundering Herd Problem

Workaround: One of approaches is to use randomized expiration times for cache keys. This prevents them from expiring simultaneously. There are other solutions to solve this issue as well, and if you find it interesting, a quick googling will reveal many of helpful resources and solutions.

Pseudo-code:

Cache Stampede

Workaround:: Since I was using Redis as the caching layer, I tackled this problem by implementing a Redis distributed lock. The lock ensures that only one process can update the cache at a time, while other processes must wait for the update to complete. This approach effectively prevents multiple processes from overwhelming the back-end simultaneously.

Pseudo-code:

Lesson Learned

If you’re using caching, challenges like Thundering Herd, Cache Stampede, and others are inevitable. My experience has shown me that nothing is perfect, it’s better to have proper planning.

Have you faced similar caching challenges? How did you overcome them?

Blog