Power of ActiveRecord and when to stop using it

sooyang

Soo Yang, Tan

Posted on January 31, 2019

Power of ActiveRecord and when to stop using it

In Rails, ActiveRecord makes it super easy to query for data from the database--but how well does it perform for your use case?

Scenario

Let's assume I have a Project model on my rails app. I would like to display the total number of project that are active and pending at a status against the total number of active projects on a dashboard.

Which would look something like this:

  Team Lead Action (25/100) | Manager Action (32/100)

Implementation

Thanks to ActiveRecord, we can easily achieve get the total count:

  active_projects = Project.where(status: 'active')
  @total_number_of_active_projects = active_projects.size
  @projects_pending_team_lead_approval = active_projects.reject(&:team_lead_action_time).count
  @projects_pending_manager_approval = active_projects.reject(&:manager_action_time).count

Voila! We can easily call these variables to display them on the dashboard. Problem solved!

However, I'm a little concerned if the amount of projects scales to a large amount. For example, just try imagining if we now need to display 5 different values, our queries will scale like this:

active_projects = Project.where(status: 'active')
  @total_number_of_active_projects = active_projects.size
  @projects_pending_team_lead_approval = active_projects.reject(&:team_lead_action_time).count
  @projects_pending_manager_approval = active_projects.reject(&:manager_action_time).count
  @projects_pending_manager_2_approval = active_projects.reject(&:manager_2_action_time).count
  @projects_pending_manager_3_approval = active_projects.reject(&:manager_3_action_time).count
  @projects_pending_manager_4_approval = active_projects.reject(&:manager_4_action_time).count
  @projects_pending_manager_5_approval = active_projects.reject(&:manager_5_action_time).count

That don't look very scalable. For such simple data value, it should be retrieved efficiently.

Let's verify things using data with the help of Benchmark with N = 1000:


                        user     system      total        real
count_active_project  0.066184   0.108178   0.174362 (  0.182557)

It takes about 0.18s. This time let's just change the query:

  active_projects = Project.where(status: 'active')
  @total_number_of_active_projects = active_projects.size
  @projects_pending_team_lead_approval = active_projects.where(team_lead_action_time: nil).size
  @projects_pending_manager_approval = active_projects.where(manager_action_time: nil).size

And benchmark it:


                              user     system      total        real
count_active_projects_ar    0.001460   0.000166   0.001626 (  0.002965)

This query which does the same thing takes only 0.002s - significantly faster. The difference is this time we are not using .reject - ( array method ) but instead relying on ActiveRecord. But why? Let's take a look at the logs.

Using .reject:

irb(main):036:0> Project.where(status: 'active').reject(&:team_lead_action_time).count
  Project Load (2.2ms)  SELECT "projects".* FROM "projects" WHERE "projects"."status" = $1  [["status", 0]]

Using ActiveRecord:

irb(main):040:0> Project.where(status: 'active', team_lead_action_time: nil).size
   (0.9ms)  SELECT COUNT(*) FROM "projects" WHERE "projects"."status" = $1 AND "projects"."team_lead_action_time" IS NULL  [["status", 0]]

As we can see from the output, .reject load the projects into the memory before processing while ActiveRecord will let the database handle the query. Database is fast and was designed to handle such process hence the speed improvement.

Annnd there's raw SQL

Taking a step further, we can write raw SQL in rails.

  query = <<~SQL
    SELECT
      SUM(CASE WHEN status = 0 THEN 1 ELSE 0 END),
      SUM(CASE WHEN status = 0 AND team_lead_action_time IS null THEN 1 ELSE 0 END),
      SUM(CASE WHEN status = 0 AND manager_action_time IS null THEN 1 ELSE 0 END)
    FROM
      PROJECTS
  SQL

  ActiveRecord::Base.connection.execute(query)

And the result is:


                              user     system      total        real
count_active_projects_sql    0.000250   0.000047   0.000297 (  0.000966)

Raw SQL only takes 0.0009s which is about 20 times faster.

Taking a look at the log:

irb(main):044:0> Project.count_active_projects_sql
   (0.8ms)  SELECT
  SUM(CASE WHEN status = 0 AND team_lead_action_time IS null THEN 1 ELSE 0 END)
FROM
  PROJECTS
=> #<PG::Result:0x00007fa3cdc1c460 status=PGRES_TUPLES_OK ntuples=1 nfields=1 cmd_tuples=1>

Raw SQL is significantly faster because it does not need to instantiate a new ActiveRecord objects which is slow but instead it returns a PG::Result object.

If we increase N = 100000, the benchmark result will be:

                              user     system      total        real
count_active_projects      4.428081   7.626659  12.054740 ( 12.177786)
count_active_projects_ar   0.002107   0.000303   0.002410 (  0.062542)
count_active_projects_sql  0.000380   0.000057   0.000437 (  0.026585)

We can clearly see that using raw SQL is the fastest 馃弳.

Takeaway

1) In the end, I decided to go with the raw SQL method. Putting performance in mind, raw SQL is suited in handling such scenario.
2) Between raw SQL and ActiveRecord, in most cases ActiveRecord would be preferable for developers productivity.
3) Be careful with the methods used. Ruby .reject - (array methods) are great but use them with caution.

Depending on the scenario, solutions may vary. We can write our code creatively and test it out to fit our needs. We need to stay curious and be creative in improving our solutions.

** The example code can be found here https://github.com/sooyang/query_speed_example

馃挅 馃挭 馃檯 馃毄
sooyang
Soo Yang, Tan

Posted on January 31, 2019

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related