mdbook-pdf: A mdBook backend for generating PDF files

hollowman6

Hollow Man

Posted on January 30, 2022

mdbook-pdf: A mdBook backend for generating PDF files

Introduction

mdBook allows you to create book from markdown files. It's pretty much alike Gitbook but implemented in Rust. However, unlike Gitbook that supports using calibre for generating PDF, for a long time, mdBook doesn't support generating PDF files natively, and supporting that is also not in their roadmap. Existing plugins (backends) such as mdbook-latex that utilize Tectonic as well as pandoc solutions will generate a PDF page that doesn't unify with the existing mdBook generated HTML version. Considering these facts, I created a mdBook backend named mdbook-pdf for generating PDF based on headless chrome and Chrome DevTools Protocol Page.printToPDF.

mdbook-pdf depends on Google Chrome / Microsoft Edge / Chromium. The generated page are pretty much alike the one you manually print to PDF in your browser by opening print.html or executing google-chrome-stable --headless --print-to-pdf=output.pdf file:///path/to/print.html, but with customization of PDF paper orientation, scale of the webpage rendering, paper width and height, page margins, generated PDF page ranges, whether to display header and footer as well as customize their formats, and more, as well as automation. It supports all the platform where Google Chrome / Microsoft Edge / Chromium would work. You can check samples of the generated PDF files in the Artifacts here (The Rust book collections generated in x86_64 Windows, macOS as well as Linux).

Installation & Usage

Since it's a plugin (backend) for mdBook, first of all you should ensure that mdbook is available.

If your machine's architecture is x86_64, or you are using Linux for ARM64, check the successful build GitHub Actions workflows, click into the latest one, and then you can get a binary from the Artifacts (including Windows, Linux, macOS).

Otherwise, make sure the rust compiling environment is available, execute cargo install mdbook-pdf to compile and install.

If you want to compile the latest version, make sure the Rust build environment is available (cargo build).
Then run git clone https://github.com/HollowMan6/mdbook-pdf.git, in the cloned folder, run cargo build --release , get the executable in target/release/, and put it in PATH.

For running, have Google Chrome / Chromium / Microsoft Edge available (installed at the default location, in PATH or binary location configured) as currently, automatically downloading Chromium binary isn't available (will update once upstream fixes such support).

  • On Windows 10 and above, the program can generate PDF normally without installing any additional software, because Microsoft Edge is the browser provided with Windows system. Of course, considering the support for the older versions of Windows without Edge, you can install Google Chrome on your computer.
  • In MacOS, you need to install Google Chrome / Microsoft Edge or Chromium.
  • In Linux, you can choose to install any of the Google Chrome / Chromium / Microsoft Edge browsers. It is recommended to install Chromium. The name of this software package in your Linux distribution is commonly chromium or chromium-browser (Note: for Ubuntu later than 18.04, you have to install chromium-browser through snap).

Make sure the following exists in your book.toml:



[output.html]

[output.pdf]


Enter fullscreen mode Exit fullscreen mode

And also [output.html.print] is not disabled (it should be enabled by default, so don't worry if the following lines doesn't occur in you book.toml).



[output.html.print]
enable = true


Enter fullscreen mode Exit fullscreen mode

A simplest book.toml is as follows:



[book]
title = "An Example"

[output.html]

[output.pdf]


Enter fullscreen mode Exit fullscreen mode

Finally you can build your book and get the PDF file with mdbook build command, your PDF file will be available at book/pdf/output.pdf.

An example for the output of the building progress

Configuration

Check book.toml and comments for details for the available configurations of [output.pdf].

Credits

This project relies on headless_chrome. Because the new version has not been released, and the default timeout is not friendly to PDF generation, I use my Fork version to publish mdbook-pdf-headless_chrome for expanding the relevant timeout to 300 seconds as a submodule of this project, thus enabling the project to be published on Crates.io as well.

Some Notes and Thoughts

mdBook supports adding alternative backend. When the mdbook build command is invoked, if the book.toml in the book folder has an [output.pdf] item in addition to the default [output.html] for generating the html web page, mdbook-pdf will be called, and the relevant book information and parameter configuration in JSON are passed to the standard input of the program. Relevant mdBook documentation can be found here.

Mechanism behind mdbook-pdf

A headless browser means that all the operations will be in the background without a graphical interface.

Before decided to use Chrome DevTools Protocol Page.printToPDF, I also tried wkhtmltopdf which is based on QT4 Webkit. Hovewer, wkhtmltopdf doesn't seem to support CSS Printing @media Rule, which will make some extra components visible and printed in the PDF.

I've also tried directly making HTTP calls to the WebDriver in python based on W3C WebDriver Protocol Print Page, as well as using selenium for calling Chrome DevTools Protocol Page.printToPDF in Python. All those methods are not robust for large pages with the errors as follows for just the same reason for the original upstream headless_chrome: the default timeout is 10 seconds, which is not friendly to large page PDF generation.



timeout: Timed out receiving message from renderer: 10.000


Enter fullscreen mode Exit fullscreen mode


selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
from unknown error: cannot determine loading status
from tab crashed


Enter fullscreen mode Exit fullscreen mode

At present, I have tested that Firefox does not support using headless_chrome for PDF generation, and the Safari browser even does not support the W3C WebDriver Protocol Print Page, let alone Chrome DevTools Protocol.

I've noticed that for some book, by using this backend, when click on some links that link inside the book, a html page that points to the original generated HTML storing path of the book will be opened, just as the issue mentioned here. I guess for those "internal" links inside the book, work should be done in the mdbook side for print.html referring here so that all the links linked "internally" would jump inside the generated print.html, as all the contents should already be on the print.html, there shouldn't be any hyperlinks that jump to other html files in the book. By resolving in this way, the generated PDF would also jump internally instead of opening a browser that won't connect to anything. I have already created a PR for this, hope it will get merged soon.

In addition, as I found that special characters such as : are very likely to appear in the title of the book, which will cause the generation of the related PDF file to fail, so the file name does not use the form of <book name>.pdf but output.pdf

Hope you enjoy it.

💖 💪 🙅 🚩
hollowman6
Hollow Man

Posted on January 30, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related