How ActiveRecord Uses Caching To Avoid Unnecessary Trips To The Database

honeybadger_staff

Honeybadger Staff

Posted on March 2, 2021

How ActiveRecord Uses Caching To Avoid Unnecessary Trips To The Database

This article was originally written by Jonathan Miles on the Honeybadger Developer Blog.

A general way to describe caching is storing the result of some code so that we can quickly retrieve it later. In some cases, this means storing a computed value to avoid needing to recompute it later. However, we can also cache data by simply keeping it in memory, without performing any computations, to avoid having to read from a hard drive or perform a network request.

This latter form is particularly relevant for ActiveRecord, where the database often runs on a separate server. Thus, all requests incur network-traffic overhead, not to mention the load placed on the database server when the query is performed again.

Fortunately, for Rails developers, ActiveRecord itself already handles a lot of this for us, perhaps without us even being conscious of it. This is nice for productivity, but sometimes, it's important to know what is being cached behind-the-scenes. For example, when you know (or expect) a value is being changed by another process, or you absolutely must have the most up-to-date value. In cases like these, ActiveRecord provides a couple of 'escape hatches' to force an uncached read of the data.

ActiveRecord's Lazy Evaluation

ActiveRecord's lazy evaluation is not caching per se, but we will be encountering it in code examples later on, so we'll provide a brief overview. When you construct an ActiveRecord query, in many cases, the code does not issue an immediate call to the database. This is what allows us to chain multiple .where clauses without having to hit the database each time:

@posts = Post.where(published: true)
# no DB hit yet
@posts = @posts.where(publied_at: Date.today)
# still nothing
@posts.count
# SELECT COUNT(*) FROM "posts" WHERE...
Enter fullscreen mode Exit fullscreen mode

There are some exceptions to this. For example, when using .find, .find_by, .pluck, .to_a, or .first, it is impossible to chain additional clauses. In most of the examples below, I will be using .to_a as a simple way to force a DB call.

Note that if you are experimenting with this in a Rails console, you will need to turn off 'echo' mode. Otherwise, the console (either irb or pry) calls .inspect on the object once you hit 'enter', which forces a DB query.
To disable echo mode, you can use the following code:

conf.echo = false # for irb
pry_instance.config.print = proc {} # for pry
Enter fullscreen mode Exit fullscreen mode

ActiveRecord Relations

The first part of ActiveRecord's built-in caching we'll look at is relations. For example, we have a typical User-Posts relationship:

# app/models/user.rb
class User < ApplicationRecord
  has_many :posts
end

# app/models/post.rb
class Post < ApplicationRecord
  belongs_to :user
end
Enter fullscreen mode Exit fullscreen mode

This gives us the handy user.posts and post.user methods to perform a database query to find the related record(s). Let's say we're using these in a controller and view:

# app/controllers/posts_controller.rb
class PostsController < ApplicationController
  def index
    @user = User.find(params[:user_id])
    @posts = @user.posts
  end
...

# app/views/posts/index.html.erb
...
<%= render 'shared/sidebar' %>
<% @posts.each do |post| %>
  <%= render post %>
<% end %>

# app/views/shared/_sidebar.html.erb
...
<% @posts.each do |post| %>
  <li><%= post.title %></li>
<% end %>
Enter fullscreen mode Exit fullscreen mode

We have a basic index action that grabs @user.posts. Similar to the previous section, the database query has not been run at this point. Rails then renders our index view, which, in turn, renders the sidebar. The sidebar calls @posts.each ..., and at this point, ActiveRecord fires off the database query to get the data.

We then return to the rest of our index template, where we have another @posts.each; however, this time, there is no database call. What's happening is that ActiveRecord is caching all these posts for us and does not bother trying to read from the database again.

Escape Hatch

There are times when we may want to force ActiveRecord to fetch the associated records again; perhaps, we know it is being changed by another process (a background job, for example). Another common situation is in automated tests where we want to get the latest value in the database to validate that the code has updated it correctly.

There are two common ways to do this, depending on the situation. I think the most common way is simply to call .reload on the association, which tells ActiveRecord that we want to ignore whatever it has cached and get the latest version from the database:

@user = User.find(1)
@user.posts # DB Call
@user.posts # Cached, no DB call
@user.posts.reload # DB call
@user.posts # Cached new version, no DB call
Enter fullscreen mode Exit fullscreen mode

Another option is to simply get a new instance of the ActiveRecord model (e.g., by calling find again):

@user = User.find(1)
@user.posts # DB Call
@user.posts # Cached, no DB call
@user = User.find(1) # @user is now a new instance of User
@user.posts # DB Call, no cache in this instance
Enter fullscreen mode Exit fullscreen mode

Caching relationships is good, but we often end up with complicated .where(...) queries beyond simple relationship lookups. This is where ActiveRecord's SQL cache comes in.

ActiveRecord's SQL Cache

ActiveRecord keeps an internal cache of queries it has performed to speed up performance. Note, however, that this cache is tied to the particular action; it is created at the start of the action and destroyed at the end of the action. This means you will only see this if you a performing the same query twice within one controller action. It also means the cache is not used in the Rails console. Cache hits are shown in the Rails log with a CACHE. For example,

class PostsController < ApplicationController
  def index
    ...
    Post.all.to_a # to_a to force DB query

    ...
    Post.all.to_a # to_a to force DB query
Enter fullscreen mode Exit fullscreen mode

produces the following log output:

  Post Load (2.1ms)  SELECT "posts".* FROM "posts"
   app/controllers/posts_controller.rb:11:in `index'
  CACHE Post Load (0.0ms)  SELECT "posts".* FROM "posts"
  ↳ app/controllers/posts_controller.rb:13:in `index'
Enter fullscreen mode Exit fullscreen mode

You can actually take a peek at what's inside the cache for an action by printing out ActiveRecord::Base.connection.query_cache (or ActiveRecord::Base.connection.query_cache.keys for just the SQL query).

Escape Hatch

There are probably not many reasons you would need to bypass the SQL Cache, but nevertheless, you can force ActiveRecord to bypass its SQL cache by using the uncached method on ActiveRecord::Base:

class PostsController < ApplicationController
  def index
    ...
    Post.all.to_a # to_a to force DB query

    ...
    ActiveRecord::Base.uncached do
      Post.all.to_a # to_a to force DB query
    end
Enter fullscreen mode Exit fullscreen mode

As it's a method on ActiveRecord::Base, you could also call it via one of your model classes if it improves readability; for example,

  Post.uncached do
    Post.all.to_a
  end
Enter fullscreen mode Exit fullscreen mode

Counter Cache

It's pretty common in web applications to want to count the records in a relationship (e.g., a user has X posts or a team account has Y users). Due to how common it is, ActiveRecord includes a way to automatically keep a counter up-to-date so that you don't have a bunch of .count calls using up database resources. It only takes a couple of steps to enable it. First, we add counter_cache to the relationship so that ActiveRecord knows to cache the count for us:

class Post < ApplicationRecord
  belongs_to :user, counter_cache: true
end
Enter fullscreen mode Exit fullscreen mode

We also need to add a new column to User, where the count will be stored. In our example, this will be User.posts_count. You can pass a symbol to counter_cache to specify the column name if needed.

rails generate migration AddPostsCountToUsers posts_count:integer
rails db:migrate
Enter fullscreen mode Exit fullscreen mode

The counters will now be set to 0 (the default). If your application already has some posts, you'll need to update them. ActiveRecord provides a reset_counters method to handle the nitty-gritty details, so you just need to pass it IDs and tell it which counter to update:

User.all.each do |user|
  User.reset_counters(user.id, :posts)
end
Enter fullscreen mode Exit fullscreen mode

Finally, we have to check the places where this count is being used. This is because calling .count will bypass the counter and will always run a COUNT() SQL query. Instead, we can use .size, which knows to use the counter cache if it exists. As an aside, you may want to default to using .size everywhere, as it also doesn't reload associations if they are already present, potentially saving a trip to the database.

Conclusion

For the most part, ActiveRecord's internal caching "just works". I can't say I've seen many cases that need to bypass it, but as with all things, knowing what goes on "under-the-hood" can save you some time and agony when you stumble into a situation that requires something out of the ordinary.

Of course, the database is not the only place where Rails is doing some behind-the-scenes caching for us. The HTTP specification includes headers that can be sent between the client and server to avoid having to re-send data that hasn't changed. In the next article in this series on caching, we'll take a look at the 304 (Not Modified) HTTP status code, how Rails handles it for you, and how you can tweak this handling.

💖 💪 🙅 🚩
honeybadger_staff
Honeybadger Staff

Posted on March 2, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related