Advanced caching mechanisms using distributed locks and async pub/sub systems.

kartik

Kartik

Posted on August 17, 2022

Advanced caching mechanisms using distributed locks and async pub/sub systems.

Intro

The impact of data caching has increased as the fast access storage methods became more and more effective and cheaper to use than using a CPU to recompute the data.

Cache has various applications and is used at different levels of an I/O or computation process. The use of caching is not a magic bullet for every use case and there are always tradeoffs. The right balance between the tradeoffs and requirements gives optimal designs which depend on the use case of the service or application.

In the context of a web service, both server and client are valid choices for storing cache. The client-side cache mostly solves issues concerning a particular user such as images, JS, etc. Server-side caching applications intend to solve for a group of users or services.
In this article, we will be discussing in depth the server-side caching strategies for REST APIs.

Sections:

Architecture setup

Let's consider a web service that serves content through an API for a CMS, eCommerce, EdTech, etc. Consider the following environment/architecture for the required service to apply our caching strategies.

Node SuperCache -- Example setup

The setup consists of a Layer-4 Application Load Balancer routing traffic to one or more Layer-7 Application Load Balancers. The L7-ALB is optional in this setup and is required in cases where the requests are to be routed further based on complex rules or other operations such as filtering, tagging, etc. The L7-ALBs in our setup route requests to web servers, which then need to do at least one database query to respond to the request.

A Layer-7 Database Load Balancer is optional here. An L7-DBLB can be used for various use cases (eg: ProxySQL). One or more database instances handle queries from the web server. A Client-side DB query/connection load balancing can also be used instead of an L7-DBLB according to the use case of the application.

Let’s consider an API with an endpoint /api/v1/products which will be used to list, get one, and filter products of an eCommerce service.

In our scenario, this API may be used more than once with various filters in multiple sections on a web page to display different sets of products like electronics, clothing, xx% discounts, etc. If we consider an optimised service, the API might be used only once to render all sections of the web page.

We will be working on various ways we can cache this API.

Node SuperCache - Un-cached performance
(This chart is the performance of the API without any caching)

Caching methods

Note: The code snippets used here are just for explanatory purposes, ignoring error handling or optimisation.

Method - 1

This is the most common method used to cache an API, where the API response is cached at the web server. We will be using a Redis server to store cache.

Node SuperCache -- Method-1

First, we need to decide on the unique key to cache the API response. An object with the request’s body and URL can be used to generate a unique hash key. The HTTP API request can be only of type GET.

When an incoming request is received by the web server:

  1. The unique key representing that request is generated. For a Node.js web server, the Object-hash npm module can be used, which uses the Node.js Crypto module to generate a hash (using SHA1, md5, etc) from a Javascript object. If your use case requires generating the hash of the same object on different platforms (example: Javascript and Python), the hash generation should be handled more carefully.

    const objectHash = require('object-hash');
    const key = objectHash({
    URL: req.originalUrl,
    body: req.body
    })
    
  2. Check the key in the Redis key-value store

    1. If the key exists, respond to the request with the value from the cache (Redis) and end the request handling chain or continue to other handlers accordingly.
    2. If the key does not exist, process the API request and save the API response in the cache with some TTL, then respond to the request.

Caching this way is the simplest and requires no complex changes to the application.

Consider a scenario where our application receives identical (hash key generated using the request+URL is identical) concurrent requests.

Node SuperCache -- Concurrent requests scenario

When there are N concurrent requests of the same API and the same hash key generated from the request;

  1. For all the concurrent requests, the web server makes a GET call to Redis to fetch the cached value of that API (total N number of calls)
  2. The GET call returns Nil, as the cached key is either expired or is not cached at least once.
  3. The web server computes the data to cache as it is not in the cache (total N number of times).
  4. When the web server has the data computed, it caches it with the respective TTL (total N number of SET calls to Redis, and only one actual SET operation on Redis if SETNX).

This caching method does not help in this scenario as the data to be cached is produced on every request. The time taken to compute data for request-1 is t1 to t4 and any number of requests in this interval do not benefit from the caching system.

Node SuperCache -- Method-1

(The performance of the API using Method-1 to cache is stressed by 50 virtual users. The response time slows down to ~13 seconds)

This caching method serves well if your use case does not have to deal with frequent concurrent requests or sudden surges in requests for a very short interval.

Method - 2

Caching at Database Load Balancer.

In Method-1 we have implemented caching at the Application level which requires an update/patch to the existing API handler to support caching. If the team of engineers handling this service prefer not to implement caching at the application level, due to some reason (legacy system, 3rd party ownership, etc), then there is another option to consider.

Generally, in services where a single DB instance is not sufficient to serve the average traffic, a single master – multiple replica DB model is adopted. The DB queries are routed to replicas or the master (primary) according to the routing rules setup. There are many client-side libraries which can route DB queries/connections to a configured set of DB instances. Server-side DB load balancer/proxy instance which sits between DB instances and web server instances has a few extra things to offer than the client-side DB proxy and is better in some use cases.

A Layer-7 Database Load Balancer which routes all DB queries to configured DB instances can help cache DB queries. When a query is executed for the first time, its result is cached and any identical queries after that are served from the cache and do not reach the DB the next time it is queried by the web server.

Standard DB load balancers include ProxySQL for MySQL and HAProxy for Postgres, etc.

Let’s use ProxySQL for our API. The steps to set up and configure ProxySQL are covered in many articles and are not discussed here.

The concurrent requests case we discussed in the last caching method (Method-1) is handled similarly in this method too. This method does not offer any improvement for the concurrent requests scenario.

This method offers good advantages for use cases such as the one which is mentioned at the beginning of this method’s section, and where the database administrator needs to enable/disable caching for one or a set of queries without any deployment or application update.

Method - 3

Caching at the application level using distributed locks.

This method fixes the concurrent requests scenario which is not handled properly by methods 1 and 2.

If we can allow only one of the concurrent requests to compute, then cache the API response and make other requests read from the cache after that, our problem is mostly solved.

We can implement the above solution with the use of locks.

When there are N concurrent requests with identical cache keys (hashed request object) and the requested API response is not yet cached, each request handler will try to lock the resource, check the key in cache again, and then compute the API response.

  1. Lock attempt on the cache key by every concurrent request.
  2. Lock on the cache key obtained by one of the concurrent requests (primary) handler
  3. All request handlers except the primary are waiting till they acquire the lock.
  4. The primary request handler checks again if the cache key exists in the cache, and if it exists, the cached API response is used and the lock is released. If the key does not exist, the handler proceeds to compute the API response.
  5. Once the API response is computed/produced, the request handler caches the API response, releases the lock, and ends the request.
  6. Once the lock is released, it is acquired by another one of the waiting concurrent request handlers (secondary).
  7. The secondary request handler will check if the API request’s cache key is in the cache, and since the API response is already produced by the primary request, the secondary request handler will find the key in the cache. The secondary request handler will release the lock and end the request.
  8. All the other concurrent requests follow the same flow as the secondary request handler and respond to the requests.

We can see here that API response is not computed more than once even in the concurrent request scenario.

Node SuperCache -- Method-3
(The performance of the API using Method-3 to cache is stressed by 50 virtual users)

Distributed locks are optimal for this scenario as there can be numerous clients (web servers in our case) applying locks on resource things which will need failover, scaling, and replication strategies in place for reliable and consistent usage.

Redlock, which uses Redis to implement distributed locks, is one of the popular strategies used.

Example:
The following example uses Node.js npm module Redlock, which implements Redlock. Redlock is available as a package in many languages.

import Client from "ioredis";
import Redlock from "redlock";

export const redis = new Client()
Enter fullscreen mode Exit fullscreen mode
export const redlock = new Redlock([redis], {
 driftFactor: 0.01,
 retryCount: 50,
 retryDelay: 200,
 retryJitter: 200,
 automaticExtensionThreshold: 200,
});

Enter fullscreen mode Exit fullscreen mode
async (req, res) =>{

   const key = `test-cache:api_v1-3-test`

   const cachedData = await redis.get(key)

   if(cachedData){
       return res.status(200).send(JSON.parse(cachedData))
   }


   try{
       const lock = await redlock.acquire([`${key}:_lock`], 10000)

       const hasError = null
       const cachedData2 = await redis.get(key)
       if(cachedData2){
           await lock.release()
           return res.status(200).send(JSON.parse(cachedData2))
       }

       let data = null;
       try{
           data = await apiHandler.testRequest(req, res)
           await redis.set(key, JSON.stringify(data), 'ex', 60)
       }
       catch(err_1){
           console.log('test:apiHandlerError: ', err_1)
           hasError = err_1
       }
       finally{
           await lock.release()
       }

       if(hasError){
           return res.status(500).send('ERROR')
       }

       return res.status(200).send(data)
   }
   catch(err){
       console.log('test:lockError: ', err)
       return res.status(500).send(err)
   }

}
Enter fullscreen mode Exit fullscreen mode

Note: This code example is just for experimenting and should not be used for production use cases as it is. The “Best practices” section at the end covers things to consider for using the discussed methods in production.

Observations:

  1. The lock acquired by the secondary request handlers is redundant and is not useful other than making the handler wait for the primary handler to complete the caching operation.
  2. N-1 of the N concurrent request handlers acquire the lock one by one sequentially. This is not the optimal behaviour but it does not affect the performance much (low latency IOPS of Redis) and it depends on the load and available IOPS of the Redis cluster.
  3. Redlock clients acquire the lock on a locked resource by repeatedly attempting to lock the required resource at particular intervals. N-1 number of request handlers (clients) attempting to lock resources results in a huge number of Redis queries. This is not a limitation but a factor to consider for optimisation.

The locks should be used carefully to avoid resource deadlocks. Always assign a TTL duration for locks acquired on resources.

Method - 4

Caching at the application level using distributed lock (optimised).

In the last method, we have successfully dealt with the case of concurrent requests. This method will be taking care of the redundant locking issue by the secondary request handlers using a Pub/Sub events system.

When there are N concurrent requests having the same key (hash from request) and the key is not yet cached:

  1. Single attempt to acquire a lock on the cache key by every concurrent request. A single attempt to lock will not retry to lock after failing to acquire a lock.
  2. Lock on the cache key obtained by one of the N concurrent requests (primary) handlers. The primary handler will then continue to compute the API response of the request.
  3. The remaining N-1 request handlers (secondary) will subscribe to a channel unique to the cache key. All the subscribed handlers will listen for ‘SUCCESS’ or ‘ERROR’ events from the channel.
  4. Once the primary handler completes producing the API response, the API response is cached and the lock on the cache key is released. Once the lock is released by the primary handler, it publishes the ‘SUCCESS’ event to the cache key’s unique channel (where N-1 secondary handlers are subscribers) and ends the request.
  5. The N-1 secondary handlers receive the ‘SUCCESS’ event and then server the API response from the cache.

Node SuperCache -- Method-4 Stats

(The performance of the API using Method-4 to cache is stressed by 50 virtual users)

Here we are relying on the Pub/Sub system to notify when an API caching operation is completed.

Node SuperCache -- Method-4 Flow Diagram
(Flow of the incoming HTTP requests when Method-4 is used)

Node SuperCache is an npm package which implements this caching method properly covering many end cases, and where you do not need to worry about the cache strategy management. Currently, this module only supports Redis and is actively maintained.

Interested developers can go through the source code for a better understanding of the underlying implementation.

Selecting a method

There is no method better than the other and the pros/cons of each one apply according to the use case of the API and application.

The main focus of this article is to discuss the case of the concurrent request surge and how to cache API responses effectively without causing performance degradation or potential service outage.

Factors such as traffic patterns, maintenance, server costs, and API data update periods are to be considered to choose an effective caching strategy.

In a use case where the data served is almost stale, then the Method-1 or Method-2 with a high cache TTL can work fine. Method-2 introduces an extra service and thus increases the maintenance and server cost component but decreases the cache setup time or application level implementation effort.

Consider a use case where an API uses an expensive DB query or multiple DB queries and your application can scale horizontally on the web server side. If the cache of the API expires and gets evicted during the peak traffic period, the service gets hit by all the traffic and might become slow/unresponsive.

The effect of the sudden surge in traffic on the service varies according to the type of service. If the web server of the service drops too many DB connections due to connection timeouts as the DB is under heavy load (slow queries), the DB blocks the web server for too many connection errors. The host getting blocked by the DB is a potential service outage issue if you are using Method-2 and should be handled carefully.

There are solutions such as Warm Pool in AWS which offer quick horizontal scaling of servers on EC2, but the costs are similar to having the servers always running.

Conclusion

It goes without saying that caching should be considered after proper optimisation of the actual data producer. Caching is not always the only solution for scaling or resource optimisation. Cache invalidation is still an issue for applications and services which use caching extensively.

Materialised cache is another way to cache data where the cache is generated right after the data is produced and every request for the data is served from the cache. I will be discussing Materialised cache in my next article.

Support Node SuperCache if you think it is interesting and useful.

💖 💪 🙅 🚩
kartik
Kartik

Posted on August 17, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related