Complex preloading strategies in Rails using custom Active Record scopes

alachaum

Arnaud

Posted on April 6, 2021

Complex preloading strategies in Rails using custom Active Record scopes

TL;DR; There are instances where the eager_load and preload Active Record directives are not enough to suit your preloading requirements. Don't be afraid to write your own class methods to define your own preloading strategies. Even if these methods return an array instead of an Active Record relation, these methods will still be more efficient than having N+1 queries.

Active Record offer two main ways of preventing N+1 queries: eager_load and preload.

The difference between these two is subtle but important:

  • eager_load loads related associations via LEFT OUTER JOIN. This is often used for belongs_to associations.
  • preload loads related associations by collecting foreign IDs, making a bulk request for the records and injecting them in the parent records. This is often used for has_many associations.

Both of these methods rely on defining "standard" associations (belongs_to and has_many).

Now what happens if you have custom associations? A typical example of custom association are many to many associations where the foreign_keys are stored on the parent model as an array.

# app/models/product.rb

class Product < ApplicationRecord
  # Fetch all label records associated with the product
  def labels
    @labels ||= Label.where(id: label_ids)
  end
end
Enter fullscreen mode Exit fullscreen mode

Why would you do that? Well maybe because you actually don't need to define unnecessary joint tables for secondary associations? Maybe because the association needs to be polymorphic (= no has_many_and_belongs_to) but is not important enough deserve a joint model? Or maybe because your app was initially designed like this and recently migrated to Rails?

No matter the reason, the question is: how to still properly preload these custom associations?

And more generally the question is: is it possible to define non-ActiveRecord logic on scopes while still maintaining a pseudo Active Record interface?

Let's see what we can do, using our custom association preloading as an example.

Custom scopes to the rescue

Rails allow you to define custom scopes. These scopes can be used to define reusable filtering options but can also be used to define common preloading strategies.

Here is a basic example:

# app/models/product.rb

class Product < ApplicationRecord
  belongs_to :author
  has_many :variants

  # Example of filtering scope
  scope :active, -> { where(active: true) }

  # Example of preloading scope
  scope :for_api, -> { eager_load(:author).preload(:variants) }
end
Enter fullscreen mode Exit fullscreen mode

If your abstract API controller is configured to invoke the for_api scope on your models, then that's one quick and efficient way of defining how your models should be preloaded when collections are requested on your API.

Now let's consider our Product model with our custom labels association:

# app/models/product.rb

class Product < ApplicationRecord
  belongs_to :author
  has_many :variants

  # Fetch all label records associated with the product
  def labels
    @labels ||= Label.where(id: label_ids)
  end
end
Enter fullscreen mode Exit fullscreen mode

We cannot use eager_load or preload here due to the custom nature of our association. But we can still manually preload the associations in bulk by manually injecting related records.

This is what it looks like:

# app/models/product.rb

class Product < ApplicationRecord
  belongs_to :author
  has_many :variants

  # Define setter to have the ability to inject labels
  attr_writer :labels

  # Define custom preloading scope (using a class method this time)
  # where we preload associations in bulk, including labels
  def self.for_api
    collection = eager_load(:author).preload(:variants)

    # Fetch labels in bulk (similar to preload)
    labels = Label.where(id: collection.flat_map(&:label_ids)

    # Inject labels on each record
    collection.each do |record|
      record.labels = labels.select { |e| record.label_ids.include?(e.id) }
    end

    # Return the collection
    collection
  end

  # Fetch all label records associated with the product
  def labels
    @labels ||= Label.where(id: label_ids)
  end
end
Enter fullscreen mode Exit fullscreen mode

The concept is simple: you load custom associations in bulk and inject the relevant associated models manually. Chaining works as long as your custom scope is last.

That is:

# Load products and associations / custom associations in one go
# Note that for_api scope is invoked last
Product.where(active: true).where("created_at > ?", 1.month.ago).for_api

# Post-chaining "works" technically but because our association is passed from for_api to the next
# filter, the model injection is not kept.
# 
# The following will generate N+1 queries:
Product.for_api.where(active: true).where("created_at > ?", 1.month.ago)
Product.for_api.find_each { |e| ... }
Product.for_api.find_in_batches { |e| ... }
Enter fullscreen mode Exit fullscreen mode

This solution works well when you need to quickly put together a custom scope. But there are two main drawbacks:

  1. Chaining only works if your custom scope is last
  2. Pagination using find_in_batches and find_each doesn't work as intended (N+1 will happen because these methods will reset our modified records)

Fixing pagination: the cheap way

If all you need is pagination, there is a quick way to do it using find_in_batches and yield.

Just edit your scope method to call find_in_batches and - based on the presence of a block - either return results or yield them.

# app/models/product.rb

class Product < ApplicationRecord
  belongs_to :author
  has_many :variants

  # Define setter to have the ability to inject labels
  attr_writer :labels

  # Define custom preloading scope (using a class method this time)
  # where we preload associations in bulk, including labels
  def self.for_api
    rs = []

    # Load records in batch
    eager_load(:author).preload(:variants).find_in_batches do |batch|
      # Fetch labels in bulk (similar to preload)
      labels = Label.where(id: batch.flat_map(&:label_ids)

      # Inject labels on each record
      batch.each do |record|
        record.labels = labels.select { |e| record.label_ids.include?(e.id) }
      end

      # Yield batch
      yield(batch) if block_given?

      # Add collection to resultset if no block
      rs += batch unless block_given?
    end

    # No results if block given (same as find_in_batches)
    # Otherwise return accumulated results
    block_given? ? nil : rs
  end

  # Fetch all label records associated with the product
  def labels
    @labels ||= Label.where(id: label_ids)
  end
end
Enter fullscreen mode Exit fullscreen mode

You can then paginate like this:

# Fetch your processed records in batches
Product.where(active: true).for_api do |batch|
  # ... do something with the batch ...
end
Enter fullscreen mode Exit fullscreen mode

This approach is not the most elegant one but is simple enough if you need something off the ground quickly.

An ActiveRecord-like solution

To get a completely neat solution, we need a class that wraps the results and mimics ActiveRecord::Relation. This approach would allow us to chain our scope wherever we want and use pagination the same way we use it with Active Record.

The following class is exactly that. It's a proxy class that delegates filtering methods to Active Record and defer our custom processing logic till the very end, when results must be returned.

# app/models/relation_processor.rb

# This class is a partial proxy class for ActiveRecord relations
# which lazily applies our scope processing at the very end of the query chain.
class RelationProcessor
  attr_accessor :relation, :processor

  #
  # The class accepts an ActiveRecord relation as argument
  # and a block defining the processing to perform on record
  # batches.
  #
  # @param [ActiveRecord::Relation] relation The initial preloading scope
  # @param [Proc] &block The processing logic to apply. Must return an array of processed records.
  #
  def initialize(relation, &block)
    @relation = relation
    @processor = block
  end

  # Add proxy methods for ActiveRecord filtering methods. This allows
  # us to chain filtering methods after our scope
  #
  # The list of methods could be expanded. But keep in mind that adding
  # methods like 'group' or 'select' will allow users to chain methods
  # that modify the resultset expected by our processor block.
  %i[where limit not].each do |m|
    define_method(m) do |*args|
      self.class.new(relation.send(m, *args), &processor)
    end
  end

  #
  # Return the full resultset with the processing logic applied.
  #
  # @return [Array<any>] The full resultset.
  #
  def resultset
    @resultset ||= processor.call(relation.to_a)
  end

  #
  # Re-implement find_in_batches to apply our processing logic
  # to each batch.
  #
  # @param [Array<any>] *args The find_in_batches arguments (batch_size etc..)
  #
  # @return [void]
  #
  def find_in_batches(*args)
    relation.find_in_batches(*args) do |batch|
      yield(processor.call(batch))
    end
  end

  #
  # Re-implement find_each by leveraging our find_in_batches logic.
  #
  # @param [Array<any>] *args The find_each arguments (batch_size etc..)
  #
  # @return [void]
  #
  def find_each(*args)
    find_in_batches(*args) do |batch|
      batch.each { |record| yield(record) }
    end
  end

  #
  # Send any unknown method to the full resultset.
  #
  # @param [String] meth The name of the method.
  # @param [Array<any>] *args The arguments to the method.
  # @param [<Type>] &block Any block passed to the method.
  #
  # @return [Any] The result of the method.
  #
  def method_missing(meth, *args, &block)
    if [].respond_to?(meth)
      resultset.public_send(meth, *args, &block)
    else
      super
    end
  end

  #
  # Check if the underlying resultset responds to a given method.
  #
  # @param [String] meth The name of the method.
  #
  # @return [Boolean] True if the resultset responds to the method. False otherwise.
  #
  def respond_to_missing?(meth, include_private = false)
    # Resultset is an array - we do not need to load the full resultset
    # to know which methods it responds to.
    [].respond_to?(meth, include_private)
  end
end
Enter fullscreen mode Exit fullscreen mode

Using this proxy class you can rewrite your Product scope in the following way:

# app/models/product.rb

class Product < ApplicationRecord
  belongs_to :author
  has_many :variants

  # Define setter to have the ability to inject labels
  attr_writer :labels

  # Define custom preloading scope using our new RelationProcessor proxy
  def self.for_api
    RelationProcessor.new(eager_load(:author).preload(:variants)) do |collection|
      # Fetch labels in bulk (similar to preload)
      labels = Label.where(id: collection.flat_map(&:label_ids)

      # Inject labels on each record
      collection.each do |record|
        record.labels = labels.select { |e| record.label_ids.include?(e.id) }
      end
    end
  end

  # Fetch all label records associated with the product
  def labels
    @labels ||= Label.where(id: label_ids)
  end
end
Enter fullscreen mode Exit fullscreen mode

Then use your scope in almost the same way as you would with an Active Record relation:

# Chaining works before and after
Product.where(active: true).for_api
Product.for_api.where(active: true)
Product.for_api.where.not(active: true)

# find_in_batches works as usual
Product.for_api.where.not(active: true).find_in_batches do |batch|
  # ... do something ...
end

# find_each as well
Product.for_api.where(active: true).find_each(batch_size: 10) do |record|
  # ... do something ...
end
Enter fullscreen mode Exit fullscreen mode

Easy!

We can now define custom preloading strategies which rely on non-Rails patterns while still loading data in bulk and benefiting from ActiveRecord-like syntax.

Important note: The proxy above is not a full implementation of the Active Record relation interface. Query directives such as group or select will likely tamper with the data expected by our processing block. Therefore I omitted them from our Proxy implementation. Invoking these methods on the query chain will fail the query.

I also omitted scoping-specific methods such as unscoped and default_scoped for ease of reading - but these could perfectly be added to the proxy.

Feel free to expand the proxy implementation to support more use cases.

💖 💪 🙅 🚩
alachaum
Arnaud

Posted on April 6, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related