Efficient Chunked File Downloads in Rails: Streaming CSV Exports

daviducolo

Davide Santangelo

Posted on October 7, 2024

Efficient Chunked File Downloads in Rails: Streaming CSV Exports

In Rails applications, exporting large datasets into CSV files can be challenging, especially when the file size exceeds what memory can comfortably handle. In these cases, we can leverage Rails streaming to create and send the file in chunks, improving both performance and memory usage.

This article walks through an implementation of chunked CSV file downloads using Rails streaming to handle large datasets efficiently.

Why Streaming?

Using streaming for large CSV files offers substantial advantages:

  • Memory Efficiency: Sends chunks of data as they are generated, so Rails doesn’t need to load the entire dataset into memory.
  • User Experience: Downloads begin almost immediately, as data is sent incrementally rather than waiting for the entire file to be generated.
  • Performance: Batching data loading reduces database load and improves performance, especially for memory-limited systems.

This approach provides a robust and efficient way to handle large CSV downloads in Rails, conserving memory while improving user experience and system performance.

Problem Overview

Suppose we need to export a large amount of data—such as a list of users, transactions, or sales records—into a CSV file. A naive approach that tries to load all records at once can result in memory overflow or very slow performance. Instead, we’ll implement a download method that generates and sends the CSV file in chunks as it’s built, allowing users to start the download process right away without taxing memory.

Setting Up the Model

In our example, we’ll use a User model, which we want to export into a CSV file. Each row will contain information such as the user’s name, email, registration date, and other relevant details.

Model Methods for CSV Export

The User model needs methods to generate CSV data for each row and to stream data to an output stream, such as response.stream in the controller.

class User < ApplicationRecord
  def self.to_csv
    CSV.generate(headers: true) do |csv|
      csv << csv_headers
      find_each(batch_size: 2000) { |user| csv << csv_row(user) }
    end
  end

  def self.stream_csv_to(output_stream)
    output_stream.write CSV.generate_line(csv_headers)
    find_each(batch_size: 2000) do |user|
      output_stream.write CSV.generate_line(csv_row(user))
    end
  end

  def self.csv_headers
    attributes.map { |attr| human_attribute_name(attr) }
  end

  def self.csv_row(user)
    attributes.map { |attr| user.send(attr) }
  end

  def self.attributes
    %w[name email registration_date city country created_at]
  end
end
Enter fullscreen mode Exit fullscreen mode

Explanation of the Methods

  1. to_csv: Generates the entire CSV content in memory. This is suitable for smaller datasets but should be avoided for large files due to memory usage.
  2. stream_csv_to: Streams CSV data directly to an output_stream, using find_each to load records in batches, which is memory-efficient and suitable for large files.
  3. csv_headers and csv_row: Define the CSV headers and rows, dynamically pulling attribute names from the model.

Setting Up the Controller: Streaming the Response

In the UsersController, configure the index action to handle both HTML and CSV responses. The CSV response uses streaming to send data to the client as it’s generated.

class UsersController < ApplicationController
  include ActionController::Live

  def index
    @pagy, @users = pagy User.all

    respond_to do |format|
      format.html do
        render "index"
      end
      format.csv do
        set_csv_headers
        stream_csv
      end
    end
  end

  private

  def set_csv_headers
    response.headers["Content-Type"] = "text/event-stream"
    response.headers["Content-Disposition"] = "attachment; filename=users-#{Date.today}.csv"
    response.headers["Cache-Control"] = "no-cache"
    response.headers["Last-Modified"] = Time.now.httpdate
  end

  def stream_csv
    User.stream_csv_to(response.stream)
  ensure
    response.stream.close
  end
end
Enter fullscreen mode Exit fullscreen mode
  • index Action: Sets up the response format for HTML or CSV. When CSV is requested, it prepares headers and streams the CSV data.
  • set_csv_headers: Configures headers for the CSV download, setting it as a file attachment and disabling caching.
  • stream_csv: Uses the stream_csv_to method in User to stream data directly to response.stream, chunking data and conserving memory.
  • ensure Block: Ensures response.stream is closed after streaming to prevent resource leaks.

To implement streaming CSV downloads in Rails, it’s important to include ActionController::Live in the controller. This module allows streaming responses directly from the controller to the client, enabling the efficient chunked delivery of large datasets.

However, using ActionController::Live can sometimes lead to unexpected issues, particularly with authentication libraries like Devise. Devise may raise errors when ActionController::Live is active, especially related to the session or Warden errors, as discussed in this GitHub issue. This happens because ActionController::Live opens a separate thread for streaming, which can cause conflicts with Devise’s thread safety and session handling.

Possible Workaround

If this issue occurs, a possible workaround is to handle authentication before starting the streaming process. Alternatively, consider moving the logic for large CSV exports to background jobs or services that can store the file temporarily and provide a direct download link, avoiding session conflicts with Devise altogether.

Using ActionController::Live requires careful handling, particularly with Devise, so test thoroughly to ensure compatibility with the rest of your application’s authentication logic.

Server configuration

When configuring nginx to handle streaming responses with Rails, especially for CSV downloads via ActionController::Live, certain adjustments need to be made to ensure the connection remains stable and efficient. Below is an example configuration for the "/users" location block, which directs requests to the Puma server.

Nginx Configuration for Streaming CSV Exports

Here’s a configuration snippet that helps manage the specifics of WebSocket or HTTP/1.1 upgrades, necessary when using ActionController::Live:

location /users {
    proxy_pass http://puma;
    proxy_http_version 1.1;

    # Enable protocol upgrade for WebSocket and streaming responses
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "Upgrade";

    # Set forwarding headers for client information and host details
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header Host $http_host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-Proto https;

    # Disable proxy_redirect to ensure direct data transmission
    proxy_redirect off;

    # Important: Tune buffer and timeout settings for streaming
    # Prevent Nginx from buffering responses for large, chunked data streams
    proxy_buffering off;
    proxy_request_buffering off;

    # Set a long timeout to support large files
    proxy_read_timeout 300;
    proxy_send_timeout 300;
}
Enter fullscreen mode Exit fullscreen mode
  • proxy_http_version 1.1: Ensures Nginx uses HTTP/1.1, which is required for streaming and WebSocket connections.
  • proxy_set_header directives: Sets headers for forwarding client details, ensuring accurate client information is passed to the Rails server.
  • proxy_buffering off and proxy_request_buffering off: Disables buffering, which is critical for streaming large files; without this, Nginx might attempt to load the entire file in memory before sending, causing performance issues.
  • proxy_read_timeout and proxy_send_timeout: Sets extended timeouts to avoid dropped connections during large file transfers. These values might need further adjustment depending on file size and expected download speeds.

Potential Issues with ActionController::Live

Even with this configuration, there can still be compatibility issues when ActionController::Live is used in combination with authentication or session management, especially with libraries like Devise. Nginx, by maintaining a persistent connection for streaming, might occasionally lead to session-related errors if the authentication layer is not designed for thread-safe usage in Rails.

💖 💪 🙅 🚩
daviducolo
Davide Santangelo

Posted on October 7, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related