Generate PDFs from HTML via Puppeteer on AWS Lambda + API Gateway
zahaar
Posted on April 25, 2022
“Evil cannot create anything new, they can only corrupt and ruin what good forces have invented or made.” - JRR Tolkien.
Preface
Would it be great to have the functionality that would enable you to generate PDF files using HTML && CSS capabilities without the need to rely on overly complex drivers that are dependent on a whole bunch of C libraries?
While also supporting all the latest features of HTML5 && CSS3?
Well, we have great news. There is a framework called Puppeteer that uses relatively new Chrome feature and makes it accessible though a NodeJS based API.
Essentially what Puppeteer does, is: Launches a Chromium browser instance in a headless mode ( not actually opening it ), and allows us to manipulate the browser via set of API command to parse website, retrieve images and generate PDF as if you were actually opening an HTML file in the latest browser version, etc..
While we can create a running Docker Puppeteer instance and deploy that on ECS or Heroku. The creation of stable && optimized image can be quite challenging...
Having a running instance in AWS Lambda IHMO in contrast would be much simpler in terms of development speed, debug and monitoring. Besides, serverless, is a nice concept for POC ( you pay for what you use )
Repo -> End Result
You can see the complete working example in this repo
Clone this repo git clone https://github.com/zahaar/generate-pdf-lambda
Import cUrl to Insomnia ( Postman is not recommended, as it can't visualize Pdf ).
Run make api-local to have local API GW running.
Send cUrl request via Insomnia.
You can also invoke Lambda bypassing API GW, by supplying an example event in file, and running make invokation-local. The response would be a base64 encoded PDF binary.
1. SetUp local AWS SAM Template with Chrome Lambda Layer
In this step the local SAM execution setUp will be complete. Once this is done, we will have a strong reference point.
The end version of this step can be fetched from 1_local-setup branch
We can create a basic SAM template by running sam init or reference a guide
but our end goal should be a sophisticated structure like this
├── Makefile
├── VERSION -- for VERSION tracking, helpful for CI
├── envs.json -- to sep envs for local execution ( if necessary )
├── events
│ └── api-gw-event.json -- an example API GW event for local execution
├── src
│ └── app.js -- main source code file
└── template.yaml -- AWS SAM configuration template
app.js contains simple code that will return the same event.body that it receives from example event.
while template.yml has a resource configuration for API GW Service
...
...
ApiGatewayApi:
Type: AWS::Serverless::Api
Properties:
StageName: Staging
BinaryMediaTypes:
- application~1pdf // Note the support for binary pdf media Type
...
and the Lambda. As per context of our goal, it's called PdfFunction
Take note of the Layer being used in this config. By setting chrome-aws-lambda, we have essentially ruled out the need to set package.json dependencies for puppeteer and chrome on Docker image thar AWS is using on EC2 for Lambdas, as this step can be quite challenging.
2. Configure Puppeteer in Lambda; Supply Template HTML
Next step is to program app.js to start puppeteer, consume HTML from an API GW event and return a base64 encoded response that would be decoded on Response by API GW.
The end version of this step can be fetched from 2_generate-pdf branch
We need to change the Lambda handler code to something like this. File ( File is too long to displayed here )
Key takeaways are:
Browser launch args parameters in this example are set specifically for AWS Lambda compatibility.
To test this code, an HTML template is needed. We will use this open-source one for demonstration.
The document is being sent as body with 'Content-Type: text/html'
Please note 'Accept: application/pdf', this is important.
Works fine here. You shouldn't need credentials for Public ECR (you can use auth for specific cases) but if you just want to consume it, remove the existing credentials