Arnaud
Posted on March 15, 2021
TL;DR; Leverage model caching on commonly used queries, associations and scopes to give your database a break. There are plenty of cheap options in Rails to efficiently use and expire your cache. The more you do it, the smarter the load becomes on your DB.
In most applications, scaling the runtime (= your application servers) is far easier than scaling your database. Using a serverless approach like GCP Cloud Run you can horizontally scale your app to thousands of instances.
The same cannot be said with databases, especially relational ones. Most of the time they represent an expensive bottleneck and are harder to scale. Which is why it's good practice to take the habit to alleviate the load on it when cheap alternatives exist.
Let's take a concrete example: background jobs. Should you use ActiveJob, Sidekiq, Resque or Cloudtasker for GCP, it's very common to have jobs defined like this:
class MyJob
def perform(project_id)
return unless (record = Project.find_by(id: project_id)
# ... do stuff related to your model ...
# E.g. longpoll data from a third-party provider
end
end
It's alright, it's just a find method. But let's imagine thousands of these jobs running constantly. Your DB will certainly cope with it, but the question is: do you really want your DB to spend expensive CPU milliseconds on this kind of basic queries?
The primary goal of a relational database is to be right, not to be fast. If you're looking to be fast, you should look at other options. Redis caching is one of the most popular one.
Let's use that pesky find method to see what cheap alternative we have...then dig into other caching opportunities.
Model.find: a quick win
Assuming you have spotted a few models which are read-intensive, the following module will provide a find_cached method which leverages Redis first. The module also expires the cache whenever your model gets updated or destroyed.
# app/models/concerns/has_find_cached.rb
# This module provides a find_cached class method which
# returns a cached version of the record instead of making
# a database call.
#
# The find_cached method relies on find_for_cached which
# specifies how the record should be loaded. If any association
# preloading should be done, then find_for_cached should be
# overridden by the including class.
#
# The cached version gets automatically expired on update,
# destroy or after 10 minutes.
module HasFindCached
extend ActiveSupport::Concern
# Find cached duration
FIND_CACHED_DURATION = 1.day
included do
# Expire cache key after change actions
after_commit :expire_find_cached_key, on: %i[update destroy]
end
#---------------------------------------
# Class methods
#---------------------------------------
class_methods do
#
# Default lookup method. To be overriden by the
# implementing class if any preloading is required.
#
# @param [String] id The ID of the record.
#
# @return [ApplicationRecord] The looked up record.
#
def find_for_cached(id)
find_by(id: id)
end
#
# Return the cache key used for the find_cached method.
#
# @param [String] id The ID of the record.
#
# @return [String] The cache key
#
def find_cached_key(id)
"#{model_name.cache_key}/#{id}/find_cached"
end
#
# Find a cached version of the project. This method is
# primarily used in import jobs to prevent making too many
# database calls.
#
# This method should only be used for reading persistent attributes,
# not real time ones (e.g. project progress or integration status)
# The integration is preloaded but project cache does not get expired
# when the integration is updated.
#
# @param [String] id The ID of the record.
#
# @return [Project] The cached version of the project
#
def find_cached(id)
Rails.cache.fetch(find_cached_key(id), skip_nil: true, expire_in: FIND_CACHED_DURATION) do
find_for_cached(id)
end
end
end
#
# Return the cache key used for the find_cached method.
#
# @return [String] The cache key
#
def find_cached_key
@find_cached_key ||= self.class.find_cached_key(id)
end
#
# Expire the cached version of the record.
#
def expire_find_cached_key
# Abort if no changes were actually applied to the record
return unless saved_changes.present? || destroyed?
# Expire cached version
Rails.cache.delete(self.class.find_cached_key(id))
end
end
You can use this module in your ActiveRecord models like this:
class Project < ApplicationRecord
include HasFindCached
# ...
end
Then update your find calls with:
class MyJob
def perform(project_id)
return unless (record = Project.find_cached(project_id)
# ... do stuff related to your model ...
# E.g. longpoll data from a third-party provider
end
end
That's all you need. You've just saved your database thousands of useless calls potentially.
"Wait! I usually need to access parent associations through this model, so I would still be making database calls!" Not if you eager load associations in the cached version of your record.
The module above allows you to customize the cached version of your record via find_for_cached. Example:
class Project < ApplicationRecord
include HasFindCached
belongs_to :company
# Eager load the parent company on the cached version
# returned by find_cached
def find_for_cached(id)
eager_load(:company).find_by(id: id)
end
end
There is a caveat though: The cached company association will not be expired upon company update. It is alright if you only need to access persistent attributes on the company association but if you need to access regularly updated attributes, then you need to manually expire the project cache upon company update.
Cache expiration of associated models can be achieved through an after_commit
callback, such as:
class Company < ApplicationRecord
has_many :projects
# Expire project cache keys after change actions
after_commit :expire_associated_find_cached_keys, on: %i[update destroy]
# ...
private
# Expire cache keys of associated records
def expire_associated_find_cached_keys
# Abort if no changes were actually applied to the record
return unless saved_changes.present? || destroyed?
# Collect all project cache keys for find_cached
project_cache_keys = projects.pluck(:id).map { |e| Project.find_cached_key(e) }
# Delete them in one go
Rails.cache.delete_multi(project_cache_keys)
end
end
Your project find_cached
version will now be properly be expired on parent model updates.
Now let's keep in mind it's a tradeoff. The more you link records together for cache expiration and the more these related records are updated, the less you'll benefit from your cache.
If all you need on your Project
cached versions is to access persistent company references that will never change (e.g. an external customer ID), then you might actually be better off not expiring the Project cached keys upon company update. But if you go down that path, ensure other developers are made aware of this caveat because relying on stale record attributes will lead to bugs difficult to troubleshoot.
Caching is an opportunistic habit, not a silver bullet
The previous section is simplistic and looks at the most basic form of caching: the find method. This is not going to save your application from DB overload. But it opens the path to more complicated caching approaches.
As an example, let's look at the Company
<-> Project
relationship. If some_company.projects
is a call you frequently make and assuming the number of projects returned is expected to be reasonable, you can provide a cached version of this association in the following manner.
class Company
has_many :projects
# Return the cache key used to cache the list of projects
def self.projects_cached_key(id)
"#{model_name.cache_key}/#{id}/projects"
end
# Return a cached version of the list of projects associated
# with this record.
def projects_cached
Rails.cache.fetch(self.class.projects_cached_key(id)) do
projects
end
end
end
class Project
belongs_to :company
# Expire project cache keys after change actions
# Note that unlike previous example, we use `after_commit` instead of `after_commit on: %i[update destroy]`
# Creating a project should lead to cache expiration.
after_commit :expire_associated_cached_keys
# ...
private
# Expire cache keys of associated records
def expire_associated_cached_keys
# Abort if no changes were actually applied to the record
return unless saved_changes.present? || destroyed?
# Expire the parent company cache
Rails.cache.delete(Company.projects_cached_key(company_id))
end
end
The same approach can be used for scopes, large queries involving joins etc..
In the end, the hardest part is thinking about which resources are involved in your cache and placing the right expiration calls on your associated models.
Now as the title says, it's an opportunistic habit. There is no point in caching every single database call in Redis as it will clutter your application code more than anything.
Your first habit should be to look at your database monitoring system. NewRelic, DataDog, GCP Query Insights...will give you hints on which queries are expensive and frequently run.
Target these first. Once you've addressed the most expensive queries you can evaluate where to further optimize database calls.
Happy caching!
Posted on March 15, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.