Supercharge your applications queries with caching

sdbarlow

sdbarlow

Posted on September 10, 2024

Supercharge your applications queries with caching

Have you ever wondered how some websites seem to load information almost instantly, even when dealing with massive amounts of data? The secret often lies in a technique called caching. In this post, we'll explore what caching is, why it's important, and how I implemented it to supercharge my application's leaderboard feature.

What is Caching? A Library Analogy

Imagine a vast library where a librarian must search through millions of books for each patron's request. Now, picture a small billboard at the entrance displaying answers to common questions. Instead of searching the entire library each time, the librarian can quickly refer to the billboard for frequent queries.

In this analogy:

  • The vast library represents your database, full of information but slow to search through.

  • You, the librarian, represent the server processing requests.

  • The patrons are users asking for information.

  • The billboard represents the cache - a small, fast storage area for frequently accessed information.

Caching is like that billboard. It stores a limited amount of frequently accessed data in a place where it can be retrieved very quickly, saving time and reducing the load on your main data storage system.

Why Implement Caching?

In my application, I had a leaderboard feature that was working well, but was inefficient. It queried the database and recalculated rankings for every request, even when data hadn't changed. This was like our librarian repeatedly searching the entire library for the same information.

I realized I needed a way to avoid these repetitive, time-consuming searches. Enter caching!

Setting Up Redis: Our Caching System

For my caching system, I chose to use something called Redis. Redis is what we call an "in-memory data structure store." That's a mouthful, so let's break it down:

  • "In-memory" means it keeps information in the computer's RAM, which is super fast to access.

  • Excels at quick data retrieval, ideal for caching

  • Limited memory, but only needs to store frequently accessed data

Here is the code to initialize your redis-client within a Python server:

redis_client = Redis(
host='my-redis-server.com',
port=6379,
password='secret-password'
)

Creating Our Labels (Cache Keys)

Think of cache keys as labels or addresses for your cached data. They help you quickly locate and access specific pieces of information in your cache, much like how a library's cataloging system helps you find books.

In my leaderboard system, I create these keys like this:

if user_scope == 'Friends':
cache_key = f"leaderboard:user:{user_id}:{attribute}:{duration}:{user_scope}:{page}"
else:
cache_key = f"leaderboard:{attribute}:{duration}:{user_scope}:{page}"

Don't worry if this looks like a foreign language - let's break it down!

Think of these keys like labels on folders in a filing cabinet. Each part of the key represents specific information:

Let's look at each part:

  1. leaderboard: This always comes first. It's like saying "This is for the leaderboard system."

  2. user:{user_id}: This only appears for friend leaderboards. It's like having a personal folder for each user's friends list.

  3. {attribute}: This could be "words_learned" or "daily_streak" - whatever we're ranking.

  4. {duration}: This is either "AllTime" or "Today", telling us the time period for the rankings.

  5. {user_scope}: This is either "Friends" or "Global", indicating if it's a friend leaderboard or for all users.

  6. {page}: This is the page number of the leaderboard, as we show the results in chunks.

So, a complete cache key might look like this:
leaderboard:user:12345:words_learned:AllTime:Friends:1

This tells our assistant: "Find the leaderboard for user 12345's friends, ranking by words learned, for all time, page 1."

But Wait! What About Changes?

You might be thinking, "Hold on a second. What if a user increases their score or position in the leaderboard? Won't the information in our cache become outdated?"

That's a great question! You're absolutely right - we need a way to make sure our cached leaderboard data stays fresh and accurate. Let's dive into how we handle this.

Our Leaderboard Refresh System

Here's how we keep our leaderboard cache fresh:

  1. Detecting Changes: Whenever a user does something that could affect their ranking (like learning new words or extending their streak), our system makes a note of it.

  2. Updating Specific Leaderboards: Instead of refreshing all leaderboards (which would be like retaking photos of every ranking board in the gym), we only update the ones that this change could affect.

For example, if Alice learns a new word:

  • We update the "Words Learned" leaderboard
  • We update both the "All Time" and "Today" duration leaderboards
  • We update Alice's friends' leaderboard and the global leaderboard
  1. Smart Invalidation: We don't immediately recalculate the entire leaderboard. Instead, we mark those specific leaderboard caches as "stale".

Here's a simplified version of what this looks like in action:

def update_user_score(user_id, attribute):
    # Update the user's score in the database
    update_database_score(user_id, attribute)

    # Mark relevant leaderboard caches as stale
    invalidate_leaderboard_cache(user_id, attribute)

def invalidate_leaderboard_cache(user_id, attribute):
    # List of cache keys to invalidate
    keys_to_invalidate = [
        f"leaderboard:{attribute}:AllTime:Global",
        f"leaderboard:{attribute}:Today:Global",
        f"leaderboard:user:{user_id}:{attribute}:AllTime:Friends",
        f"leaderboard:user:{user_id}:{attribute}:Today:Friends"
    ]

    # Mark these keys as stale in Redis
    for key in keys_to_invalidate:
        redis_client.set(f"{key}:stale", "true")
Enter fullscreen mode Exit fullscreen mode

This system ensures efficiency, up-to-date information, and continued caching benefits for unaffected leaderboards.

Enough of the talk, let's see it in Action!

the first video demonstrates the leaderboard screen prior to implementing caching, the second is with caching.

Image description

Image description

You can see just how much faster the response is from the server with caching implemented!

By implementing caching in your own applications, you can achieve similar performance boosts, providing a smoother, faster experience for your users. Remember, the key is to cache smartly - store frequently accessed data, keep it fresh, and enjoy the benefits of lightning-fast data retrieval!

💖 💪 🙅 🚩
sdbarlow
sdbarlow

Posted on September 10, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related