How ActiveRecord Uses Caching To Avoid Unnecessary Trips To The Database
Honeybadger Staff
Posted on March 2, 2021
This article was originally written by Jonathan Miles on the Honeybadger Developer Blog.
A general way to describe caching is storing the result of some code so that we can quickly retrieve it later. In some cases, this means storing a computed value to avoid needing to recompute it later. However, we can also cache data by simply keeping it in memory, without performing any computations, to avoid having to read from a hard drive or perform a network request.
This latter form is particularly relevant for ActiveRecord, where the database often runs on a separate server. Thus, all requests incur network-traffic overhead, not to mention the load placed on the database server when the query is performed again.
Fortunately, for Rails developers, ActiveRecord itself already handles a lot of this for us, perhaps without us even being conscious of it. This is nice for productivity, but sometimes, it's important to know what is being cached behind-the-scenes. For example, when you know (or expect) a value is being changed by another process, or you absolutely must have the most up-to-date value. In cases like these, ActiveRecord provides a couple of 'escape hatches' to force an uncached read of the data.
ActiveRecord's Lazy Evaluation
ActiveRecord's lazy evaluation is not caching per se, but we will be encountering it in code examples later on, so we'll provide a brief overview. When you construct an ActiveRecord query, in many cases, the code does not issue an immediate call to the database. This is what allows us to chain multiple .where
clauses without having to hit the database each time:
@posts = Post.where(published: true)
# no DB hit yet
@posts = @posts.where(publied_at: Date.today)
# still nothing
@posts.count
# SELECT COUNT(*) FROM "posts" WHERE...
There are some exceptions to this. For example, when using .find
, .find_by
, .pluck
, .to_a
, or .first
, it is impossible to chain additional clauses. In most of the examples below, I will be using .to_a
as a simple way to force a DB call.
Note that if you are experimenting with this in a Rails console, you will need to turn off 'echo' mode. Otherwise, the console (either irb or pry) calls .inspect
on the object once you hit 'enter', which forces a DB query.
To disable echo mode, you can use the following code:
conf.echo = false # for irb
pry_instance.config.print = proc {} # for pry
ActiveRecord Relations
The first part of ActiveRecord's built-in caching we'll look at is relations. For example, we have a typical User-Posts
relationship:
# app/models/user.rb
class User < ApplicationRecord
has_many :posts
end
# app/models/post.rb
class Post < ApplicationRecord
belongs_to :user
end
This gives us the handy user.posts
and post.user
methods to perform a database query to find the related record(s). Let's say we're using these in a controller and view:
# app/controllers/posts_controller.rb
class PostsController < ApplicationController
def index
@user = User.find(params[:user_id])
@posts = @user.posts
end
...
# app/views/posts/index.html.erb
...
<%= render 'shared/sidebar' %>
<% @posts.each do |post| %>
<%= render post %>
<% end %>
# app/views/shared/_sidebar.html.erb
...
<% @posts.each do |post| %>
<li><%= post.title %></li>
<% end %>
We have a basic index
action that grabs @user.posts
. Similar to the previous section, the database query has not been run at this point. Rails then renders our index
view, which, in turn, renders the sidebar. The sidebar calls @posts.each ...
, and at this point, ActiveRecord fires off the database query to get the data.
We then return to the rest of our index
template, where we have another @posts.each
; however, this time, there is no database call. What's happening is that ActiveRecord is caching all these posts for us and does not bother trying to read from the database again.
Escape Hatch
There are times when we may want to force ActiveRecord to fetch the associated records again; perhaps, we know it is being changed by another process (a background job, for example). Another common situation is in automated tests where we want to get the latest value in the database to validate that the code has updated it correctly.
There are two common ways to do this, depending on the situation. I think the most common way is simply to call .reload
on the association, which tells ActiveRecord that we want to ignore whatever it has cached and get the latest version from the database:
@user = User.find(1)
@user.posts # DB Call
@user.posts # Cached, no DB call
@user.posts.reload # DB call
@user.posts # Cached new version, no DB call
Another option is to simply get a new instance of the ActiveRecord model (e.g., by calling find
again):
@user = User.find(1)
@user.posts # DB Call
@user.posts # Cached, no DB call
@user = User.find(1) # @user is now a new instance of User
@user.posts # DB Call, no cache in this instance
Caching relationships is good, but we often end up with complicated .where(...)
queries beyond simple relationship lookups. This is where ActiveRecord's SQL cache comes in.
ActiveRecord's SQL Cache
ActiveRecord keeps an internal cache of queries it has performed to speed up performance. Note, however, that this cache is tied to the particular action; it is created at the start of the action and destroyed at the end of the action. This means you will only see this if you a performing the same query twice within one controller action. It also means the cache is not used in the Rails console. Cache hits are shown in the Rails log with a CACHE
. For example,
class PostsController < ApplicationController
def index
...
Post.all.to_a # to_a to force DB query
...
Post.all.to_a # to_a to force DB query
produces the following log output:
Post Load (2.1ms) SELECT "posts".* FROM "posts"
↳ app/controllers/posts_controller.rb:11:in `index'
CACHE Post Load (0.0ms) SELECT "posts".* FROM "posts"
↳ app/controllers/posts_controller.rb:13:in `index'
You can actually take a peek at what's inside the cache for an action by printing out ActiveRecord::Base.connection.query_cache
(or ActiveRecord::Base.connection.query_cache.keys
for just the SQL query).
Escape Hatch
There are probably not many reasons you would need to bypass the SQL Cache, but nevertheless, you can force ActiveRecord to bypass its SQL cache by using the uncached
method on ActiveRecord::Base
:
class PostsController < ApplicationController
def index
...
Post.all.to_a # to_a to force DB query
...
ActiveRecord::Base.uncached do
Post.all.to_a # to_a to force DB query
end
As it's a method on ActiveRecord::Base
, you could also call it via one of your model classes if it improves readability; for example,
Post.uncached do
Post.all.to_a
end
Counter Cache
It's pretty common in web applications to want to count the records in a relationship (e.g., a user has X posts or a team account has Y users). Due to how common it is, ActiveRecord includes a way to automatically keep a counter up-to-date so that you don't have a bunch of .count
calls using up database resources. It only takes a couple of steps to enable it. First, we add counter_cache
to the relationship so that ActiveRecord knows to cache the count for us:
class Post < ApplicationRecord
belongs_to :user, counter_cache: true
end
We also need to add a new column to User
, where the count will be stored. In our example, this will be User.posts_count
. You can pass a symbol to counter_cache
to specify the column name if needed.
rails generate migration AddPostsCountToUsers posts_count:integer
rails db:migrate
The counters will now be set to 0 (the default). If your application already has some posts, you'll need to update them. ActiveRecord provides a reset_counters
method to handle the nitty-gritty details, so you just need to pass it IDs and tell it which counter to update:
User.all.each do |user|
User.reset_counters(user.id, :posts)
end
Finally, we have to check the places where this count is being used. This is because calling .count
will bypass the counter and will always run a COUNT()
SQL query. Instead, we can use .size
, which knows to use the counter cache if it exists. As an aside, you may want to default to using .size
everywhere, as it also doesn't reload associations if they are already present, potentially saving a trip to the database.
Conclusion
For the most part, ActiveRecord's internal caching "just works". I can't say I've seen many cases that need to bypass it, but as with all things, knowing what goes on "under-the-hood" can save you some time and agony when you stumble into a situation that requires something out of the ordinary.
Of course, the database is not the only place where Rails is doing some behind-the-scenes caching for us. The HTTP specification includes headers that can be sent between the client and server to avoid having to re-send data that hasn't changed. In the next article in this series on caching, we'll take a look at the 304 (Not Modified)
HTTP status code, how Rails handles it for you, and how you can tweak this handling.
Posted on March 2, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 30, 2024