Heroku, I stream you!

megatux

Cristian Molina

Posted on August 9, 2021

Heroku, I stream you!

The problem: timeouts on generated content exported as CSV or other formats.

It's a very common feature to have export functionality on the page to formats like CSV, XLSX, PDF, etc.

This is a simple thing on Rails inside a #respond_to:

def index
  @records = ...
  respond_to do |format|
    format.html # index.html
    format.csv { some_csv_generation_code }
  end
end
Enter fullscreen mode Exit fullscreen mode

However, some day, errors start to appear from users when they try to download a lot of information.

The server responds with timeout errors after a certain amount of seconds and that stops the request that was generating the file.

In a PaaS like Heroku this time limit is at 30 seconds of no content generated. Also there is another timeout of 55’’ between each new fragment of output. Citing the official information:

An application has an initial 30 second window to respond with a single byte back to the client. However, each byte transmitted thereafter (either received from the client or sent by your application) resets a rolling 55 second window. If no data is sent during the 55 second window, the connection will be terminated.

Rails seems to have streaming functionality to handle this kind of output generation in the ActionController::Live. Also in Rails edge (future v7) there is a #send_stream method that seems to achieve this functionality so I copied it to my Rails 5.2 app into a controller concern and changed the CSV generation logic to implement the Enumerable #each method.
It was not working until I also override the Last-Modified header. See this issue discussion.

This is the full code of the controller concern:

require 'content_disposition'

module Streamable
  extend ActiveSupport::Concern
  include ActionController::Live

  # Almost verbatim copy of new Rails 7 method.
  # See https://edgeapi.rubyonrails.org/classes/ActionController/Live.html#method-i-send_stream
  def send_stream(filename:, disposition: 'attachment', type: nil)
    response.headers['Content-Type'] =
      (type.is_a?(Symbol) ? Mime[type].to_s : type) ||
      Mime::Type.lookup_by_extension(File.extname(filename).downcase.delete('.')) ||
      'application/octet-stream'

    response.headers['Content-Disposition'] = ContentDisposition.format(disposition: disposition, filename: filename)
      # with rails 6 remove content_disposition gem and use:
      #  ActionDispatch::Http::ContentDisposition.format(disposition: disposition, filename: filename)

    # Extra: needed for streaming
    response.headers['Last-Modified'] = Time.now.httpdate

    yield response.stream
  ensure
    response.stream.close
  end
end

Enter fullscreen mode Exit fullscreen mode

For Rails 5.x I needed an extra gem to set the Content-Disposition header:

gem 'content_disposition', '~> 1.0' # Not needed on Rails 6

A testing example of a controller that generates slow dummy data:

class ExporterController < ApplicationController
  include Streameable

  def index
    respond_to do |format|
      format.html # index.html
      format.js   # index.js
      format.csv do
        send_stream(attachment_opts) do |stream|
          stream.write "email_address,updated_at\n"

          50.times.each do |i|
            line = "pepe_#{i}@acme.com,#{Time.zone.now}\n"
            stream.write line
            puts line
            sleep 1  # force slow response
          end
        end
      end
    end
  end

  private

  def attachment_opts
    {
      filename: "data_#{Time.zone.now.to_i}.csv",
      disposition: 'attachment',
      type: 'text/csv'
    }
  end
end
Enter fullscreen mode Exit fullscreen mode

Finally, if you use something like curl you will see the output generated second by second.

curl -i http://localhost:3000/exporter

Alt Text

Now your app can stream big amounts of data like a champ!.

I have deployed this app in a Heroku app and the code is on this github repo.

An extra performance tip. When using ActiveRecord to fetch tons of records from the DB consider using #find_each instead of #each so it will fetch the records in batches.

Let me know if this was helpful for you.
Happy Rails hacking!

💖 💪 🙅 🚩
megatux
Cristian Molina

Posted on August 9, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related