Programmatically Rendering PDFs from HTML using Chrome and Puppeteer
Stout Systems
Posted on January 14, 2022
I've been a web developer for a long time, and one concept that always comes up is turning HTML into a PDF. It's a pretty natural request; HTML is a nice display oriented format and has great layout and styling abilities. Plus, for a web app, I probably already have code written to produce the exact content I want. I just need to turn that content into a read-only, portable PDF for downloading or attaching to an email, without forcing the user to print to pdf manually.
I've used numerous technologies for doing that conversion, including WebSuperGoo's ABCpdf.net, and Rotativa's free library. (We even have an article on how to use it.) The latter is still in use on some of my projects, but it—and its newer .Net Core version—is based on wkhtmltopdf.exe, which is a problem. More about that later.
Most of these tools work by using a browser to render the HTML and create the PDF document. The browser engine already knows how to render HTML (and use CSS and JavaScript), so why reinvent the wheel?
The issue with wkhtmltopdf—and any tool that uses it—is that it's based on the WebKit rendering engine used by Safari. If you haven't heard, WebKit is the new Internet Explorer in terms of holding back standards adoption and preventing us from having nice things. Okay, yes, that's bombastic hyperbole, but I've honestly run into more than my fair share of styling and layout issues in which wkhtmltopdf just won't render like all other browsers will. Maybe that's a function of wkthtmltopdf, rather than WebKit itself, but still, my problem remains in search of a solution.
The latest solution I've found is Puppeteer. It's a Node.Js package that manipulates a headless (i.e. no browser window) version of Chrome. Chrome updates frequently, supports good standards, and renders well. The PDF output from Puppeteer matches pretty exactly with the output you would get using Chrome to print to a PDF manually.
Puppeteer supports great options like headers and footers (with template content for "Page N of X"), control of print margins, printing background images, different page sizes, and more. Since you are rendering in a real browser, you can even use JavaScript to generate dynamic content as needed.
Implementation Details
Step one is to generate your HTML. There are two ways to do this; you can point Puppeteer at a URL or at a local (temp) file with HTML. The former might be convenient, but only if your web application allows anonymous access to the URL you want to print. That's not usually the case for me, so I often end up with the local file route. The only catch there is to make sure any resources like CSS files or images you reference are publicly accessible—and that your HTML has the full URLs for them (or a element in the head to specify the base URL for relative requests).
Step two is to call Puppeteer and tell it what to print, what options, and where to put the output PDF.
Step three is simply to wait for Puppeteer to finish rendering, grab the bytes of the file and send it to the response pipeline. Clean up your temp files, and voila!
The Gory ASP.Net details
Step two above was leaving a lot unsaid! How do I actually call a Node.Js package from my Asp.Net web application? Here are the steps I took, which are about the same for Asp.Net or Asp.Net Core (though a little easier in the latter).
Install Node.JS on the webserver. You'll be running a node "application" there now, so you'll need it. You'll also need it on your development machine, but you might already have it since it's a part of a lot of build tool chains now.
Create a folder in your project and on the server that will host the new node scripts.
This folder should be outside of the web application; it doesn't need to be served by your web server, so it shouldn't be web accessible.Use npm to install Puppeteer and any other packages you need in that folder (separate from any npm packages you use elsewhere in your project). I do this on my dev machine. Then I copy the package.json (and package-lock.json) to the server folder and "npm install" there. I don't deploy the node_modules folder, though I guess you could, instead of deploying package.json. Be aware that the modules needed for Chrome are 200+MB, so you don't want them in your source repository!
Write a simple JavaScript file that exports a function that runs Puppeteer with whatever inputs you want.
The Puppeteer project has some samples that can serve as a starting point for this code.Now the tricky part is calling that script from ASP.Net!
I used AspNetCore.NodeServices to manage the Node.js executable instance. Despite the name, older versions of that library run fine under Asp.Net (non-core), though it has an embarrassing number of dependencies. (Annoying, but worth it in my case.) If you are using this from Asp.Net Core it's pretty straightforward. From the older Asp.net platform you'll need to jump through some small hoops to setup dependency injection, but they aren't too difficult (aside from your learning curve if you aren't familiar), and don't need to impact the rest of your codebase.That library has a nice simple method to invoke your script and wait for it to finish. One gotcha is that your script function should take a "callback" function as the first argument. The function should call this with either an error or the success results like so:
//AddNumbers.js
module.exports = async (callback, a, b) => {
var rv = (parseInt(a,10) + parseInt(b,10)).toString();
callback(null /error/, rv /success/);
};
Summary
I find needing to install node.js on the server and the concept of having a separate node.js "application" deployed and running on the server, separate from my web application, to be a strange architecture.
It's not that much worse than wkhtmltopdf solutions though, since they also need a large separate application to be deployed to the server. Those solutions just hide the complexity better through wrapper libraries.
Even if more complex, the results of this Node.js/Puppeteer/Chrome solution are worth it! I can use modern layouts and CSS (e.g. flexbox) to create PDFs that look as good as the browser. And PDF generation speed is fast! Since Chrome updates frequently, I expect that this will be a solution I can support for years to come.
_This is a technical/business article catered to developers, hiring/project managers, and other technical staff looking to improve their skills. Sign up to receive our articles in your email inbox.
If you're looking for a job in the tech industry, visit our job board to see if you qualify for some of our positions. If you're looking to hire technical talent for your company, please contact us.
Stout Systems is the software consulting and staffing company Fueled by the Most Powerful Technology Available: Human Intelligence®. We were founded in 1993 and are based in Ann Arbor, Michigan. We have clients across the U.S. in domains including engineering, scientific, manufacturing, education, marketing, entertainment, small business and robotics. We provide expert level software, Web and embedded systems development consulting and staffing services along with direct-hire technical recruiting and placements._
Posted on January 14, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.