Rails: Background Job Done Right (Case 1)

If you continue to read this article, I assume that you know Ruby, OOP in Ruby, RoR, Active Record, Sidekiq, and ruby metaprogramming.

This article series is the implementation of sidekiq's best practice using real case from my work.

Perhaps you should read these articles before we start:
Coding Sidekiq Workers the Right Way
Improving Rails Performance with Better Background Jobs

Let's start our journey!

Table of Contents:
1. Problem
2. First Solution (bad)
3. Second Solution (better)
4. Third Solution (much better)
5. Flow Comparison
6. The Code
7. Final Word

1. Problem

Every 4 hours, I have to check if there is any Invoice that need to be checked to 3rd party API. If there is any, I have to do http request to 3rd API per record found. There are 6 types of Invoices. They are unique, so there are 6 models: InvoiceA..InvoiceF.

To do this, I have to use background job. And Sidekiq is my choice.

For the cron job, I use sidekiq-cron gems

2. First Solution (bad)

The example of my initial solution for the worker was:

# app/workers/scheduled/check_status_worker.rb
class Scheduled::CheckStatusWorker
  include Sidekiq::Worker

  def perform
    invoice_as = InvoiceA.where(status: "unfinished")
    check_status_of_(invoice_as)

    # looping b, c, d ,e

    invoice_fs = InvoiceF.where(status: "unfinished")
    check_status_of_(invoice_fs)
  end

  def check_status_of_(invoices)
    invoices.each do |invoice|
      invoice.check_status
    end
  end
end

Notice:

I don't use sidekiq_options retry: false. Well, if I use it, then if an exception happen in the middle of the job, the rest of the records won't be checked to 3rd party. Let say I have 10,000 records, and exception occured in 500th record, then 9,500 records won't be checked.

I use each method for each models. If there are 10,000 records per model, so this worker will be running for a loong time.

I call check_status from the instance model. This is not good. I should make a class to checked to 3rd party API. Remember this

I put the logic in the worker. This is also not good.

In this solution, I have 6 models and 1 worker.

What can we improve?

3. Second solution (better)

This solution consist of 3 phases.

First, I'll move the logic from worker to class. In this phase, I'll have 6 models, 1 worker, and 1 class (new).

The code of the worker and the class:

# app/workers/scheduled/check_status_worker.rb
class Scheduled::CheckStatusWorker
  include Sidekiq::Worker

  def perform
    StartScheduledCheckStatus.new.call
  end
end

# app/lib/start_scheduled_check_status.rb
class StartScheduledCheckStatus
  INVOICE = %w(invoice_a invoice_b invoice_c invoice_d invoice_e invoice_f)

  def call
    INVOICE.each { |inv| send("check_#{inv}" }
  end

  private

  INVOICE.each do |method|
    define_method "check_#{method}" do
      clazz = method.camelize
      invoices = clazz.constantize.where(status: "unfinished")
      check_status_of_(invoices)
    end
  end

  def check_status_of_(invoices)
    invoices.each do |invoice|
      invoice.check_status
    end
  end
end

In this phase, no improvement in perfomance. Only clean code.

Second, I'll move check_status from every models to stand alone classes. The check_status method for every models is simulated like this:

# app/models/invoice_a.rb
class InvoiceA < ApplicationRecord
  def check_status
    # action-1. build message
    # action-2. do http request
    # action-3. action based on the response
  end
end

The url of the API is the same (https://url/check_status), so action-2 is the same across models. Action-3 is the same across models. But the messages (action-1) is unique for each models.

Knowing this, I'll make 1 class to do action-2 and action-3, and 6 classes to do action-1.

In this phase, I'll have 6 models, 1 worker, and 1 class + 7 classes (new).

The code will be look like these:

# app/lib/start_scheduled_check_status.rb
class StartScheduledCheckStatus
  ...
  private

  INVOICE.each do |method|
    define_method "check_#{method}" do
      clazz = method.camelize
      invoices = clazz.constantize.where(status: "unfinished")
      check_status_of_(invoices, clazz: clazz)
    end
  end

  def check_status_of_(invoices, clazz: clazz)
    invoices.each do |invoice|
      CheckStatus.new(invoice, clazz).call
    end
  end
end

# app/lib/check_status.rb
class CheckStatus
  def initialize(invoice, clazz)
    @invoice = invoice
    @clazz   = clazz
  end

  def call
    message = "#{@clazz}MessageBuilder".constantize.new(@invoice).call
    # action-2: do http request using message
    # action-3: action based on the response
  end
end

# app/lib/invoice_a_message_builder.rb
class InvoiceAMessageBuilder
  def initialize(invoice)
    @invoice = invoice
  end

  def call
    # action-1: build message using @invoice
  end
end

Phew, that was long (remember, there are 6 message builder class). But in this phase, no improvement in perfomance. Only clean code. Let's go to next phase.

Third, I'll make a worker for CheckStatus. This is the improvement in performance. We push CheckStatus to a worker per record, so Scheduled::CheckStatusWorker does not have to run http request at all! If there are 10,000 records found, so there will be 10,000 jobs. This is background job done right!!

In this phase, I'll have 6 models, 1 worker + 1 worker (new), and 8 classes.

So, the code will be like these:

# app/lib/start_scheduled_check_status.rb
class StartScheduledCheckStatus
  ...

  def check_status_of_(invoices, clazz: clazz)
    invoices.each do |invoice|
      CheckStatusWorker.perform_async(invoice.id, clazz)
    end
  end
end

# app/workers/check_status_worker.rb
class CheckStatusWorker
  include Sidekiq::Worker
  sidekiq_options retry: false

  def perform(invoice_id, clazz)
    CheckStatus.new(invoice_id, clazz).call
  end
end

# app/lib/check_status.rb
class CheckStatus
  def initialize(invoice_id, clazz)
    @invoice_id = invoice_id
    @clazz      = clazz
    @invoice    = invoice
  end

  def call
    message = "#{@clazz}MessageBuilder".constantize.new(@invoice).call
    # action-2: do http request using message
    # action-3: action based on the response
  end

  private

  def invoice
    @clazz.constantize.find(@invoice_id)
  end
end

Notice:

Now I use sidekiq_options retry: false, because if one job is failed, it won't affect another jobs. Let say there are 10,000 jobs, 1 job failed the 9,999 other jobs will still running.

Can I improve again? Yes, we can!

4. Third Solution (much better)

This solution consist of 2 phases.

First, don't use each method. But use push_bulk method instead! Why should we use this?

The push_bulk method allows us to push a large number of jobs to Redis. This method cuts out the redis network round trip latency. I wouldn't recommend pushing more than 1000 per call but YMMV based on network quality, size of job args, etc. A large number of jobs can cause a bit of Redis command processing latency.

source: github

In this phase, I'll have 6 models, 2 workers, and 8 classes.

Let's upgrade our code a bit:

# app/lib/start_scheduled_check_status.rb
class StartScheduledCheckStatus
  INVOICE = %w(invoice_a invoice_b invoice_c invoice_d invoice_e invoice_f)
  SIZE = 1000

  def call
    INVOICE.each { |inv| send("check_#{inv}" }
  end

  private

  INVOICE.each do |method|
    define_method "check_#{method}" do
      clazz = method.camelize
      invoices_ids = get_invoices_ids_from_(clazz)
      check_status_of_(invoices_ids, clazz: clazz)
    end
  end

  def get_invoices_ids_from_(clazz)
    clazz.constantize.where(status: "unfinished").pluck(:id)
  end

  def check_status_of_(invoices_ids, clazz: clazz)
    args = get_array_of_args_from_(invoices_ids, clazz)
    args.each do |arg|
      Sidekiq::Client.push_bulk('class' => CheckStatusWorker, 'args' => arg)
    end
  end

  def get_array_of_args_from_(ids, clazz)
    arrs = ids.in_groups_of(SIZE, false)
    arrs.map! { |arr| arr.in_groups_of(1) }
    arrs.map! do |arr|
      arr.map! do |a|
        a << clazz
      end
    end
    arrs
  end
end

Notice:

I still use each. Before, every each means I put 1 invoice into a single job per invoice. But this time, every each means I put 1,000 invoices into a single job per invoice (bulk push).

The size of invoices is divided into 1000, as it is the best practice of sidekiq based on the official guide (see quoted text aboout push_bulk)

Second, we can improve our first worker. Have a little bit logic in worker is not a crime.

In this phase, I'll have 6 models, 2 workers + 1 worker (new), and 8 classes.

# app/workers/scheduled/check_status_worker.rb
class Scheduled::CheckStatusWorker
  include Sidekiq::Worker

  INVOICE = %w(invoice_a invoice_b invoice_c invoice_d invoice_e invoice_f)

  def perform
    INVOICE.each do |inv|
      SeparatedScheduledCheckStatusWorker.perform_async(inv)
    end
  end
end

# app/workers/separated_scheduled_check_status_worker.rb
class SeparatedScheduledCheckStatusWorker
  include Sidekiq::Worker

  def perform(inv)
    StartScheduledCheckStatus.new(inv).call
  end
end

# app/lib/start_scheduled_check_status.rb
class StartScheduledCheckStatus
  INVOICE = %w(invoice_a invoice_b invoice_c invoice_d invoice_e invoice_f)
  SIZE = 1000

  # inv is one of INVOICE
  def initialize(inv)
    @inv = inv
  end

  def call
    send("check_#{@inv}")
  end
  ...

Notice:

Before, we have 1 job for 6 models. Now we separate 1 job to become 6 jobs, which means 1 job per model.

This is another memory-usage improvement.

And that's all !

5. Flow Comparison

Flow in First Solution:

Every 4 hour, cron will start 1 job: Scheduled::CheckStatusWorker
This job will search unfinished invoice from Invoice A
Every invoice found will be checked one-by-one to 3rd party API using http request
The job will continue to search unfinished invoice from Invoice B
Once again, every invoice found will be checked one-by-one to 3rd party API
And over and over until Invoice F.

This is like The One Ring:

One Job to rule them all, One Job to find them, One Job to bring them all, and in the http request bind them.

Remember, The One Ring is the source of problems!

Flow in Third Solution:

Every 4 hour, cron will start 1 job: Scheduled::CheckStatusWorker
This job will start 6 other jobs: SeparatedScheduledCheckStatusWorker, one job for one model
All the 6 jobs will search unfinished invoice from their respective model
Every invoice found will be pushed bulk to job CheckStatusWorker. One job for one invoice

You can see the huge differences from First Solution and Third Solution. There will be huge improvement to our apps !

6. The Code

From 6 models and 1 worker, we end up have 6 models, 3 workers, and 8 classes.

This is the entire code:

# app/workers/scheduled/check_status_worker.rb
class Scheduled::CheckStatusWorker
  include Sidekiq::Worker

  INVOICE = %w(invoice_a invoice_b invoice_c invoice_d invoice_e invoice_f)

  def perform
    INVOICE.each do |inv|
      SeparatedScheduledCheckStatusWorker.perform_async(inv)
    end
  end
end

# app/workers/separated_scheduled_check_status_worker.rb
class SeparatedScheduledCheckStatusWorker
  include Sidekiq::Worker

  def perform(inv)
    StartScheduledCheckStatus.new(inv).call
  end
end

# app/lib/start_scheduled_check_status.rb
class StartScheduledCheckStatus
  INVOICE = %w(invoice_a invoice_b invoice_c invoice_d invoice_e invoice_f)
  SIZE = 1000

  def initialize(inv)
    @inv = inv
  end

  def call
    send("check_#{@inv}")
  end

  private

  INVOICE.each do |method|
    define_method "check_#{method}" do
      clazz = method.camelize
      invoices_ids = get_invoices_ids_from_(clazz)
      check_status_of_(invoices_ids, clazz: clazz)
    end
  end

  def get_invoices_ids_from_(clazz)
    clazz.constantize.where(status: "unfinished").pluck(:id)
  end

  def check_status_of_(invoices_ids, clazz: clazz)
    args = get_array_of_args_from_(invoices_ids, clazz)
    args.each do |arg|
      Sidekiq::Client.push_bulk('class' => CheckStatusWorker, 'args' => arg)
    end
  end

  def get_array_of_args_from_(ids, clazz)
    arrs = ids.in_groups_of(SIZE, false)
    arrs.map! { |arr| arr.in_groups_of(1) }
    arrs.map! do |arr|
      arr.map! do |a|
        a << clazz
      end
    end
    arrs
  end
end

# app/workers/check_status_worker.rb
class CheckStatusWorker
  include Sidekiq::Worker
  sidekiq_options retry: false

  def perform(invoice_id, clazz)
    CheckStatus.new(invoice_id, clazz).call
  end
end

# app/lib/check_status.rb
class CheckStatus
  def initialize(invoice_id, clazz)
    @invoice_id = invoice_id
    @clazz      = clazz
    @invoice    = invoice
  end

  def call
    message = "#{@clazz}MessageBuilder".constantize.new(@invoice).call
    # action-2: do http request using message
    # action-3: action based on the response
  end

  private

  def invoice
    @clazz.constantize.find(@invoice_id)
  end
end

# app/lib/invoice_a_message_builder.rb
class InvoiceAMessageBuilder
  def initialize(invoice)
    @invoice = invoice
  end

  def call
    # action-1: build message using @invoice
  end
end

Note: There are 6 models, so there are 6 message builder classes.

7. Final Word

Sorry for bad english. And sorry for bad naming classes, methods, and workers. I'll improve this article later.

source: myself

Blog

Rails: Background Job Done Right (Case 1)

K Putra

1. Problem

2. First Solution (bad)

3. Second solution (better)

4. Third Solution (much better)

5. Flow Comparison

6. The Code

7. Final Word

Join Our Newsletter. No Spam, Only the good stuff.

Related