Arnaud
Posted on April 6, 2021
TL;DR; There are instances where the eager_load
and preload
Active Record directives are not enough to suit your preloading requirements. Don't be afraid to write your own class methods to define your own preloading strategies. Even if these methods return an array instead of an Active Record relation, these methods will still be more efficient than having N+1 queries.
Active Record offer two main ways of preventing N+1 queries: eager_load
and preload
.
The difference between these two is subtle but important:
-
eager_load
loads related associations via LEFT OUTER JOIN. This is often used for belongs_to associations. -
preload
loads related associations by collecting foreign IDs, making a bulk request for the records and injecting them in the parent records. This is often used for has_many associations.
Both of these methods rely on defining "standard" associations (belongs_to
and has_many
).
Now what happens if you have custom associations? A typical example of custom association are many to many associations where the foreign_keys are stored on the parent model as an array.
# app/models/product.rb
class Product < ApplicationRecord
# Fetch all label records associated with the product
def labels
@labels ||= Label.where(id: label_ids)
end
end
Why would you do that? Well maybe because you actually don't need to define unnecessary joint tables for secondary associations? Maybe because the association needs to be polymorphic (= no has_many_and_belongs_to) but is not important enough deserve a joint model? Or maybe because your app was initially designed like this and recently migrated to Rails?
No matter the reason, the question is: how to still properly preload these custom associations?
And more generally the question is: is it possible to define non-ActiveRecord logic on scopes while still maintaining a pseudo Active Record interface?
Let's see what we can do, using our custom association preloading as an example.
Custom scopes to the rescue
Rails allow you to define custom scopes. These scopes can be used to define reusable filtering options but can also be used to define common preloading strategies.
Here is a basic example:
# app/models/product.rb
class Product < ApplicationRecord
belongs_to :author
has_many :variants
# Example of filtering scope
scope :active, -> { where(active: true) }
# Example of preloading scope
scope :for_api, -> { eager_load(:author).preload(:variants) }
end
If your abstract API controller is configured to invoke the for_api scope on your models, then that's one quick and efficient way of defining how your models should be preloaded when collections are requested on your API.
Now let's consider our Product model with our custom labels association:
# app/models/product.rb
class Product < ApplicationRecord
belongs_to :author
has_many :variants
# Fetch all label records associated with the product
def labels
@labels ||= Label.where(id: label_ids)
end
end
We cannot use eager_load
or preload
here due to the custom nature of our association. But we can still manually preload the associations in bulk by manually injecting related records.
This is what it looks like:
# app/models/product.rb
class Product < ApplicationRecord
belongs_to :author
has_many :variants
# Define setter to have the ability to inject labels
attr_writer :labels
# Define custom preloading scope (using a class method this time)
# where we preload associations in bulk, including labels
def self.for_api
collection = eager_load(:author).preload(:variants)
# Fetch labels in bulk (similar to preload)
labels = Label.where(id: collection.flat_map(&:label_ids)
# Inject labels on each record
collection.each do |record|
record.labels = labels.select { |e| record.label_ids.include?(e.id) }
end
# Return the collection
collection
end
# Fetch all label records associated with the product
def labels
@labels ||= Label.where(id: label_ids)
end
end
The concept is simple: you load custom associations in bulk and inject the relevant associated models manually. Chaining works as long as your custom scope is last.
That is:
# Load products and associations / custom associations in one go
# Note that for_api scope is invoked last
Product.where(active: true).where("created_at > ?", 1.month.ago).for_api
# Post-chaining "works" technically but because our association is passed from for_api to the next
# filter, the model injection is not kept.
#
# The following will generate N+1 queries:
Product.for_api.where(active: true).where("created_at > ?", 1.month.ago)
Product.for_api.find_each { |e| ... }
Product.for_api.find_in_batches { |e| ... }
This solution works well when you need to quickly put together a custom scope. But there are two main drawbacks:
- Chaining only works if your custom scope is last
- Pagination using
find_in_batches
andfind_each
doesn't work as intended (N+1 will happen because these methods will reset our modified records)
Fixing pagination: the cheap way
If all you need is pagination, there is a quick way to do it using find_in_batches
and yield
.
Just edit your scope method to call find_in_batches
and - based on the presence of a block - either return results or yield them.
# app/models/product.rb
class Product < ApplicationRecord
belongs_to :author
has_many :variants
# Define setter to have the ability to inject labels
attr_writer :labels
# Define custom preloading scope (using a class method this time)
# where we preload associations in bulk, including labels
def self.for_api
rs = []
# Load records in batch
eager_load(:author).preload(:variants).find_in_batches do |batch|
# Fetch labels in bulk (similar to preload)
labels = Label.where(id: batch.flat_map(&:label_ids)
# Inject labels on each record
batch.each do |record|
record.labels = labels.select { |e| record.label_ids.include?(e.id) }
end
# Yield batch
yield(batch) if block_given?
# Add collection to resultset if no block
rs += batch unless block_given?
end
# No results if block given (same as find_in_batches)
# Otherwise return accumulated results
block_given? ? nil : rs
end
# Fetch all label records associated with the product
def labels
@labels ||= Label.where(id: label_ids)
end
end
You can then paginate like this:
# Fetch your processed records in batches
Product.where(active: true).for_api do |batch|
# ... do something with the batch ...
end
This approach is not the most elegant one but is simple enough if you need something off the ground quickly.
An ActiveRecord-like solution
To get a completely neat solution, we need a class that wraps the results and mimics ActiveRecord::Relation. This approach would allow us to chain our scope wherever we want and use pagination the same way we use it with Active Record.
The following class is exactly that. It's a proxy class that delegates filtering methods to Active Record and defer our custom processing logic till the very end, when results must be returned.
# app/models/relation_processor.rb
# This class is a partial proxy class for ActiveRecord relations
# which lazily applies our scope processing at the very end of the query chain.
class RelationProcessor
attr_accessor :relation, :processor
#
# The class accepts an ActiveRecord relation as argument
# and a block defining the processing to perform on record
# batches.
#
# @param [ActiveRecord::Relation] relation The initial preloading scope
# @param [Proc] &block The processing logic to apply. Must return an array of processed records.
#
def initialize(relation, &block)
@relation = relation
@processor = block
end
# Add proxy methods for ActiveRecord filtering methods. This allows
# us to chain filtering methods after our scope
#
# The list of methods could be expanded. But keep in mind that adding
# methods like 'group' or 'select' will allow users to chain methods
# that modify the resultset expected by our processor block.
%i[where limit not].each do |m|
define_method(m) do |*args|
self.class.new(relation.send(m, *args), &processor)
end
end
#
# Return the full resultset with the processing logic applied.
#
# @return [Array<any>] The full resultset.
#
def resultset
@resultset ||= processor.call(relation.to_a)
end
#
# Re-implement find_in_batches to apply our processing logic
# to each batch.
#
# @param [Array<any>] *args The find_in_batches arguments (batch_size etc..)
#
# @return [void]
#
def find_in_batches(*args)
relation.find_in_batches(*args) do |batch|
yield(processor.call(batch))
end
end
#
# Re-implement find_each by leveraging our find_in_batches logic.
#
# @param [Array<any>] *args The find_each arguments (batch_size etc..)
#
# @return [void]
#
def find_each(*args)
find_in_batches(*args) do |batch|
batch.each { |record| yield(record) }
end
end
#
# Send any unknown method to the full resultset.
#
# @param [String] meth The name of the method.
# @param [Array<any>] *args The arguments to the method.
# @param [<Type>] &block Any block passed to the method.
#
# @return [Any] The result of the method.
#
def method_missing(meth, *args, &block)
if [].respond_to?(meth)
resultset.public_send(meth, *args, &block)
else
super
end
end
#
# Check if the underlying resultset responds to a given method.
#
# @param [String] meth The name of the method.
#
# @return [Boolean] True if the resultset responds to the method. False otherwise.
#
def respond_to_missing?(meth, include_private = false)
# Resultset is an array - we do not need to load the full resultset
# to know which methods it responds to.
[].respond_to?(meth, include_private)
end
end
Using this proxy class you can rewrite your Product scope in the following way:
# app/models/product.rb
class Product < ApplicationRecord
belongs_to :author
has_many :variants
# Define setter to have the ability to inject labels
attr_writer :labels
# Define custom preloading scope using our new RelationProcessor proxy
def self.for_api
RelationProcessor.new(eager_load(:author).preload(:variants)) do |collection|
# Fetch labels in bulk (similar to preload)
labels = Label.where(id: collection.flat_map(&:label_ids)
# Inject labels on each record
collection.each do |record|
record.labels = labels.select { |e| record.label_ids.include?(e.id) }
end
end
end
# Fetch all label records associated with the product
def labels
@labels ||= Label.where(id: label_ids)
end
end
Then use your scope in almost the same way as you would with an Active Record relation:
# Chaining works before and after
Product.where(active: true).for_api
Product.for_api.where(active: true)
Product.for_api.where.not(active: true)
# find_in_batches works as usual
Product.for_api.where.not(active: true).find_in_batches do |batch|
# ... do something ...
end
# find_each as well
Product.for_api.where(active: true).find_each(batch_size: 10) do |record|
# ... do something ...
end
Easy!
We can now define custom preloading strategies which rely on non-Rails patterns while still loading data in bulk and benefiting from ActiveRecord-like syntax.
Important note: The proxy above is not a full implementation of the Active Record relation interface. Query directives such as group or select will likely tamper with the data expected by our processing block. Therefore I omitted them from our Proxy implementation. Invoking these methods on the query chain will fail the query.
I also omitted scoping-specific methods such as unscoped
and default_scoped
for ease of reading - but these could perfectly be added to the proxy.
Feel free to expand the proxy implementation to support more use cases.
Posted on April 6, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.