Converting HTML web pages into PDF
Akarsh Jaiswal
Posted on May 26, 2024
In this article, I will guide you through the straightforward process of converting HTML web pages into PDF documents using Puppeteer. This Node.js library provides a user-friendly API to control Chrome or Chromium over the DevTools Protocol.
Prerequisites
Before I start, ensure you have Node.js and npm installed on your machine. Node.js is a JavaScript runtime built on Chrome’s V8 JavaScript engine, and npm is the package manager for the Node.js platform. If not, you can download and install Node.js from the official website (https://nodejs.org/en/download), where the Node.js package manager is included in the Node.js distribution.
You can verify the installation by running the following commands in your terminal:
node --version
npm --version
Step 1: Initialize a new Node.js project
First, create a new directory for your project and navigate into it:
mkdir html-to-pdf-demo
cd html-to-pdf-demo
Then, initialize a new Node.js project by running:
npm init -y
This will create a new ‘package.json* file in your project directory.
Step 2: Install Puppeteer
Next, install Puppeteer by running:
npm install puppeteer
This will download a recent version of Chromium, a headless browser that Puppeteer controls.
Step 3: Write the script
Create a new index.js file in your project directory and open it in your text
editor. Then, paste the following code:
const puppeteer =
require('puppeteer');
async function printPDF() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto (http://
marvel2950.github.io, {waitUntil:
'networkidle0'});
const pdf = await
page.pdf ({ format: 'A4' });
await browser.close();
return pdf;
}
printPDF().then (pdf => {
require('fs') .writeFileSync('output.pdf', pdf);
});
This script launches a new browser instance, opens a new page, navigates to http://marvel2950.github.io, and generates a PDF. The ‘{waitUntil: ‘networkidle0’}’ option ensures that the ‘page.goto’ function waits until there are no more than 0 network connections for at least 500 ms.
Step 4: Run the script
node index.js
And that’s it! This will create a new PDF document named ‘output.pdf’ in your project directory. This file is the result of the PDF generation process and contains the content of the HTML web page in a PDF format.
Posted on May 26, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.