Davide Santangelo
Posted on October 7, 2024
In Rails applications, exporting large datasets into CSV files can be challenging, especially when the file size exceeds what memory can comfortably handle. In these cases, we can leverage Rails streaming to create and send the file in chunks, improving both performance and memory usage.
This article walks through an implementation of chunked CSV file downloads using Rails streaming to handle large datasets efficiently.
Why Streaming?
Using streaming for large CSV files offers substantial advantages:
- Memory Efficiency: Sends chunks of data as they are generated, so Rails doesn’t need to load the entire dataset into memory.
- User Experience: Downloads begin almost immediately, as data is sent incrementally rather than waiting for the entire file to be generated.
- Performance: Batching data loading reduces database load and improves performance, especially for memory-limited systems.
This approach provides a robust and efficient way to handle large CSV downloads in Rails, conserving memory while improving user experience and system performance.
Problem Overview
Suppose we need to export a large amount of data—such as a list of users, transactions, or sales records—into a CSV file. A naive approach that tries to load all records at once can result in memory overflow or very slow performance. Instead, we’ll implement a download method that generates and sends the CSV file in chunks as it’s built, allowing users to start the download process right away without taxing memory.
Setting Up the Model
In our example, we’ll use a User model, which we want to export into a CSV file. Each row will contain information such as the user’s name, email, registration date, and other relevant details.
Model Methods for CSV Export
The User model needs methods to generate CSV data for each row and to stream data to an output stream, such as response.stream in the controller.
class User < ApplicationRecord
def self.to_csv
CSV.generate(headers: true) do |csv|
csv << csv_headers
find_each(batch_size: 2000) { |user| csv << csv_row(user) }
end
end
def self.stream_csv_to(output_stream)
output_stream.write CSV.generate_line(csv_headers)
find_each(batch_size: 2000) do |user|
output_stream.write CSV.generate_line(csv_row(user))
end
end
def self.csv_headers
attributes.map { |attr| human_attribute_name(attr) }
end
def self.csv_row(user)
attributes.map { |attr| user.send(attr) }
end
def self.attributes
%w[name email registration_date city country created_at]
end
end
Explanation of the Methods
- to_csv: Generates the entire CSV content in memory. This is suitable for smaller datasets but should be avoided for large files due to memory usage.
- stream_csv_to: Streams CSV data directly to an output_stream, using find_each to load records in batches, which is memory-efficient and suitable for large files.
- csv_headers and csv_row: Define the CSV headers and rows, dynamically pulling attribute names from the model.
Setting Up the Controller: Streaming the Response
In the UsersController, configure the index action to handle both HTML and CSV responses. The CSV response uses streaming to send data to the client as it’s generated.
class UsersController < ApplicationController
include ActionController::Live
def index
@pagy, @users = pagy User.all
respond_to do |format|
format.html do
render "index"
end
format.csv do
set_csv_headers
stream_csv
end
end
end
private
def set_csv_headers
response.headers["Content-Type"] = "text/event-stream"
response.headers["Content-Disposition"] = "attachment; filename=users-#{Date.today}.csv"
response.headers["Cache-Control"] = "no-cache"
response.headers["Last-Modified"] = Time.now.httpdate
end
def stream_csv
User.stream_csv_to(response.stream)
ensure
response.stream.close
end
end
- index Action: Sets up the response format for HTML or CSV. When CSV is requested, it prepares headers and streams the CSV data.
- set_csv_headers: Configures headers for the CSV download, setting it as a file attachment and disabling caching.
- stream_csv: Uses the stream_csv_to method in User to stream data directly to response.stream, chunking data and conserving memory.
- ensure Block: Ensures response.stream is closed after streaming to prevent resource leaks.
To implement streaming CSV downloads in Rails, it’s important to include ActionController::Live in the controller. This module allows streaming responses directly from the controller to the client, enabling the efficient chunked delivery of large datasets.
However, using ActionController::Live can sometimes lead to unexpected issues, particularly with authentication libraries like Devise. Devise may raise errors when ActionController::Live is active, especially related to the session or Warden errors, as discussed in this GitHub issue. This happens because ActionController::Live opens a separate thread for streaming, which can cause conflicts with Devise’s thread safety and session handling.
Possible Workaround
If this issue occurs, a possible workaround is to handle authentication before starting the streaming process. Alternatively, consider moving the logic for large CSV exports to background jobs or services that can store the file temporarily and provide a direct download link, avoiding session conflicts with Devise altogether.
Using ActionController::Live requires careful handling, particularly with Devise, so test thoroughly to ensure compatibility with the rest of your application’s authentication logic.
Server configuration
When configuring nginx to handle streaming responses with Rails, especially for CSV downloads via ActionController::Live, certain adjustments need to be made to ensure the connection remains stable and efficient. Below is an example configuration for the "/users" location block, which directs requests to the Puma server.
Nginx Configuration for Streaming CSV Exports
Here’s a configuration snippet that helps manage the specifics of WebSocket or HTTP/1.1 upgrades, necessary when using ActionController::Live:
location /users {
proxy_pass http://puma;
proxy_http_version 1.1;
# Enable protocol upgrade for WebSocket and streaming responses
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
# Set forwarding headers for client information and host details
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-Proto https;
# Disable proxy_redirect to ensure direct data transmission
proxy_redirect off;
# Important: Tune buffer and timeout settings for streaming
# Prevent Nginx from buffering responses for large, chunked data streams
proxy_buffering off;
proxy_request_buffering off;
# Set a long timeout to support large files
proxy_read_timeout 300;
proxy_send_timeout 300;
}
- proxy_http_version 1.1: Ensures Nginx uses HTTP/1.1, which is required for streaming and WebSocket connections.
- proxy_set_header directives: Sets headers for forwarding client details, ensuring accurate client information is passed to the Rails server.
- proxy_buffering off and proxy_request_buffering off: Disables buffering, which is critical for streaming large files; without this, Nginx might attempt to load the entire file in memory before sending, causing performance issues.
- proxy_read_timeout and proxy_send_timeout: Sets extended timeouts to avoid dropped connections during large file transfers. These values might need further adjustment depending on file size and expected download speeds.
Potential Issues with ActionController::Live
Even with this configuration, there can still be compatibility issues when ActionController::Live is used in combination with authentication or session management, especially with libraries like Devise. Nginx, by maintaining a persistent connection for streaming, might occasionally lead to session-related errors if the authentication layer is not designed for thread-safe usage in Rails.
Posted on October 7, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.