How to generate a PDF file in Rails using the Grover gem?

When it comes to generating PDF files from HTML templates, there are several libraries and tools available. In this article, we'll take a closer look at the Grover gem and discuss why we chose it over other alternatives like Prawn and wicked_pdf.

A modern way to Generate PDF Files in Rails

Challenges with wicked_pdf

Initially, we used wicked_pdf in some of our ongoing projects, and it worked well in most cases, enabling us to generate PDF files from our templates successfully. However, we did face some challenges with certain Bootstrap features like the col and row and flexbox styles, which were not fully supported by the library. We were able to work around this issue by changing the current view implementation to using a table instead of a flexbox or grid. However, this is not an ideal solution, as it limits the flexibility of the view.

Additionally, we discovered that wicked_pdf is built on top of wkhtmltopdf, an open-source command-line tool that uses the Webkit rendering engine to convert HTML files to PDF and various image formats. However, this tool was archived as of January 2, 2023, which made us reconsider our choice of using wicked_pdf.

Introducing Grover

This led us to explore other PDF generation tools and libraries, and we discovered the Grover gem. Grover is a Ruby gem that uses Google Puppeteer and Chromium to convert HTML into PDF, PNG, or JPEG files. To use Grover, you will need to install both the Grover gem and the puppeteer library.

Advantages of Grover and also limitations of wicked_pdf

One of the key advantages of Grover is its reputation and reliable community. It has been widely adopted by the Rails community due to its ease of use and versatility. Grover supports the latest CSS3 features and JavaScript libraries, making it a popular choice for complex PDF conversions.

Another advantage of Grover is its ability to render dynamic content, which is crucial for web applications that generate documents on the fly. This is achieved through Google Puppeteer, which provides a headless Chrome/Chromium environment to execute JavaScript and render dynamic content.

In contrast, wicked_pdf is a popular alternative for PDF generation in Rails, but it has limitations in terms of supporting the latest CSS3 and JavaScript features. Additionally, it may require additional configuration and setup for rendering dynamic content, which could be time-consuming.

Customization options

Grover offers a range of configuration options to control the PDF layout, including page size, orientation, margin, and header/footer content. Moreover, it allows you to inject custom CSS and JavaScript code into the HTML to further customize the PDF output. This provides a high degree of flexibility in terms of customization.

Methods for Generating PDF Files using Grover in Rails

Generating PDF files in Rails is a common task, and Grover is one of the gems that can help us accomplish it. In this blog post, we will explore two different methods for generating PDF files using Grover in Rails.

Render from template

Rendering a template is a common approach to generating PDF files in Rails. This method involves creating an HTML template that includes placeholders for dynamic data or a custom template without a header or side navigation. Grover can then merge the template with the data to generate a PDF file.

One advantage of this approach is that it provides complete control over the layout and formatting of the PDF output. This approach is particularly useful for generating documents such as invoices, receipts, and reports that require a specific layout and design.

To render a template with Grover, you can use the render_to_string method in the Rails controller. This method takes a view template and any necessary data and returns the HTML code as a string. However, this code only contains relative URLs for stylesheets, images, and other assets. As a result, there may be issues with missing stylesheets and fonts when generating PDFs using Grover. In such cases, it is important to set the display_url property correctly. This is because Grover defaults to using example.com as the host, which can cause errors when attempting to access stylesheets and fonts from other sites.

To prevent or resolve this issue, pass the relative HTML code as a string, along with any necessary data, to the Grover::HTMLPreprocessor helper. This will convert it into HTML code with all absolute URLs.

relative_html = PayslipController.new.render_to_string({
  template: 'controller/view',
  layout: 'my_layout',
  locals: { payroll: payroll, address: address }
})

absolute_html = **Grover::HTMLPreprocessor.process**(relative_html, 'http://my.server/', 'http')

We also have the second option that involves specifying a display_url that will replace the default configuration of the example.com host, which can cause the CORS error when the page is displaying on another host and is different from our host.

pdf = Grover.new(relative_html, **display_url: base_url**).to_pdf

Rendering from a URL

This approach involves specifying a URL to a web page along with the user’s cookies to make sure that Grover has permission to access the page we need to print. This method is useful for generating PDFs from existing web pages, such as product pages, blog posts, or help documentation.

Grover.new(url, cookies: header_cookies).to_pdf

def header_cookies
  request.headers['Cookie'].split('; ').map do |cookie|
    key, value = cookie.split '='

    { name: key, value: value, domain: request.headers['Host'] }
  end
end

Comparison, which one should I use?

In this section, we'll take a closer look at each method and explore its strengths and weaknesses to help you determine which approach is best suited for your particular use case.

	Rendering from a Template	Rendering from a URL
Advantages	- Fine-grained control over the content and layout of the PDF document - Can easily incorporate dynamic data and logic - Easier to manage dependencies, such as stylesheets and fonts - Can be faster and more efficient than rendering from a URL	- Allows for generating PDFs from existing web pages, such as product pages, blog posts, or help documentation - Supports rendering of single-page applications (SPAs) and other JavaScript-heavy websites - Can print any page without specifying the data needed for that page - Grover takes care of all the rendering for you - Print pages outside the application
Limitations	- Requires development and maintenance of the template - Can be less flexible and adaptable than rendering from a URL - Need to prepare the data required for the rendering template	- Requires both Puppeteer and Chromium to be installed on the server - Requires passing user's cookies or session data for authorization - May not work if cookies have expired or the user is not authenticated - May require additional configuration to ensure security and reliability

I encountered some significant issues when implementing this feature, please take a look to consider:

No-sandbox Puppeteer

When using Grover with AWS, there may be a need to run Puppeteer in no-sandbox mode, which can cause security issues. The no-sandbox mode disables the sandboxing mechanism of Chromium that isolates the browser from the operating system, which means that malicious code can potentially harm the system.

So we need to create a dedicated user for export purposes only, but this can be difficult to configure when switching to this user. This approach involves creating a new user on the server with restricted permissions and allowing only the necessary access to perform the export task. This can ensure better security for the system, but it requires careful configuration and maintenance.

Authorization with cookies

This is because some websites use cookies to store session data or other information, which is necessary for rendering certain pages correctly. Without these cookies, Grover may not be able to access the necessary resources or data, which can result in incomplete or inaccurate PDF output. (The user will receive a PDF file about the login page instead of what they actually need).

When rendering pages synchronously, passing the user's cookies to Grover is usually straightforward. However, when pushing the job to a worker, it may be more challenging to pass the cookies along with the request. This is because workers typically operate in a separate process or thread from the main application, which may not have access to the user's cookies or session data. Rather than that, what happens if the cookie expired at the time the job is executing, how can we retry this job without s newly created cookie?

Render as a background job

To achieve better performance and scalability, it may be necessary to render the PDF as a background job. This is especially true when dealing with many PDF files within a zip file. However, this requires handling the job as a service rather than within the controller, as it is an asynchronous service.

Rendering from a URL

One of the challenges of using the URL to render the PDF is ensuring that cookies still work when executing the job, no one can ensure that, maybe the user logout the account before the job execution or the cookies has been expired.

It’s hard to resolve this problem, we can have some tries to work around this problem, we take it into account but it seems these solutions will do harm to our application

Customize some middleware to completely allow access when exporting PDF
Create a super user that has privileges to access all pages

Rendering from a template

Rendering from a template can be a good alternative for rendering PDFs as a background job, as it doesn't have any significant issues. However, one remaining challenge is that the controller's helper methods may be missing when rendering from a template, and this needs to be done manually.

To overcome this challenge, we can use the view_context and include the required helpers to access them for rendering the PDF. However, this method can be cumbersome and complicated.

view = ActionView::Base.new(EwaPayroll: :PayrollsController.view_paths)

view.class_eval do
    include Rails.application.routes.url_helpers
    include PayrollHelper
    include FontAwesome::Rails::IconHelper
    include Pundit
end

Fortunately, Rails has the built-in feature called ActionController::Renderer can be an efficient and scalable solution for rendering PDFs as a background job. With this feature, arbitrary templates can be rendered without being in controller actions or helper concerns. This eliminates the complexity of manually handling the controller's context and helper methods.

To use ActionController::Renderer, simply call the render method on the renderer object, passing in the desired template name. For example:

PayrollsController.renderer.render(template: '...')

In combination with the Grover gem in Rails, this approach can provide a solid solution for rendering PDFs in the background.

Blog

How to generate a PDF file in Rails using the Grover gem?

Sang Huynh Thanh