Toan Huynh
Posted on September 4, 2024
Introduction
- In this blog post, we’ll show how to create a Lambda function that generates PDF file from HTML using Puppeteer and uploads PDF to S3.
- We’ll also explore how to deploy Chromium on AWS Lambda Layer.
- This blog post uses the The AWS Cloud Development Kit (AWS CDK) and AWS Command Line Interface (AWS CLI) to simplify the deployment of AWS resources.
- You can download the code for this blog post from the Github Repository. To deploy to your AWS account, follow the instructions in the README file.
Architecture
In this post, a client sends a request to a Lambda function that generates PDF and saves it to an S3 bucket. The architecture looks like this:
Prerequisites
- An AWS account with appropriate permissions.
- AWS CLI - check out AWS Document to install and configure credentials.
- AWS CDK v2 - Install AWS CDK CLI globally.
- Node.js 18+ and NPM - Download Link.
Initialize the Project with AWS CDK in Typescript
Run this command to create a folder and navigate into the directory for our app:
mkdir cdk-typescript-lambda-chromium && cd cdk-typescript-lambda-chromium
Then, initialize a blank AWS CDK project in Typescript by running the command:
cdk init app --language typescript
This will generate a project structure with various directories, files and dependencies required. Below is the project structure:
In the project structure, there are a few main files to focus on:
-
cdk.json
: Contains information that the toolkit will use to run our app. In our case, it will benpx ts-node --prefer-ts-exts bin/cdk-typescript-lambda-chromium.ts
. -
bin/cdk-typescript-lambda-chromium.ts
: The entry point of the CDK application. It loads whatever the stack we define inlib/cdk-typescript-lambda-chromium-stack.ts
. -
lib/cdk-typescript-lambda-chromium-stack.ts
: This is where the CDK application's main stack is defined. We'll spend most of our time here to create Lambda Function.
Create S3 bucket
First, we’ll create a new S3 bucket to store Chromium Layer and generated PDF files with CLI command below:
aws s3 mb s3://YOUR_S3_BUCKET_NAME
Setting up Puppeteer and Chrome on AWS Lambda
Next, we’ll need to use puppeteer-core
, which includes only a compressed version of Chrome. To incorporate Puppeteer into AWS Lambda function, we'll utilize the chromium
package, packages by Sparticuz - Github.
This package will be bundled into Lambda Layer, a convenient way to reuse code across multiple Lambda functions. It increases performance compared to using the off-the-shelf
puppeteer
bundle and decrease bundle size, leading to potentially faster deployment times.
Use the following commands to create a zipped Chromium layer with Linux commands:
git clone --depth=1 https://github.com/sparticuz/chromium.git && \
cd chromium && \
make chromium.zip
This will create a chromium.zip
file. Uploading it to the S3 bucket to easy addition to the Lambda Layer and to store it for future use via AWS CLI:
bucketName="YOUR_S3_BUCKET_NAME" && \
versionNumber="127" && \
aws s3 cp chromium.zip "s3://${bucketName}/chromiumLayers/chromium${versionNumber}.zip" && \
aws lambda publish-layer-version --layer-name chromium --description "Chromium v${versionNumber}" --content "S3Bucket=${bucketName},S3Key=chromiumLayers/chromium${versionNumber}.zip" --compatible-runtimes nodejs --compatible-architectures x86_64
When the command runs successfully, the output will look like this:
Create a Lambda Function Stack with AWS CDK
Then, we’ll define our CDK stack, which will contain the AWS resources required for PDF generating function.
Before that, we’ll install a package called dotenv
and create a config.ts
file in lib
to load environment variables from .env
file. Install the package using:
npm install dotenv
We'll use the AWS CDK to define the infrastructure as code. The AWS CDK Stack defines all the AWS resources used by the application.
AWS CDK helps create repeatable deployments quickly and reduces human error from clicking around the console.
First, fetch Chromium Lambda Layer ARN:
const chromiumLayer = lambda.LayerVersion.fromLayerVersionArn(
this,
'chromiumLayerStack',
config.getChromiumLayerArn()
)
Next, we’ll define the Lambda function that runs on Node.js 18 with 1GB of memory allocated. The code
and handler
properties indicate where the code is located. The timeout is set to 300 seconds(5 minutes). This stack also includes a reference to the Chromium layer that is needed inside our Lambda, and the region code at deployment time:
const lambdaFunction = new lambda.Function(this, 'lambdaNodeStack', {
code: lambda.Code.fromAsset('src/generating-pdf/lib'),
functionName: `generatingPdfLambda`,
handler: 'index.handler',
memorySize: 1024,
runtime: lambda.Runtime.NODEJS_18_X,
description: 'Convert html to PDF for users to download',
environment: {
REGION: config.getRegion(),
},
timeout: cdk.Duration.seconds(300),
layers: [chromiumLayer],
})
Since our lambda function needs to read and write data to S3, we'll grant the necessary permissions to access the S3 Bucket. This follows the principle of least privilege:
const bucketArn = config.getS3BucketArn()
lambdaFunction.addToRolePolicy(
new iam.PolicyStatement({
effect: iam.Effect.ALLOW,
actions: ['s3:GetObject', 's3:PutObject'],
resources: [bucketArn, `${bucketArn}/*`],
})
)
Finally, we create a Lambda URL endpoint by calling the addFunctionUrl()
method on our Lambda Function instance. We pass in two options:
-
authType: The authentication mode required to invoke the Lambda function; we use
FunctionUrlAuthType.NONE
to make the Lambda publicly accessible, so anyone with the Function URL can invoke it. - cors: We set it to ['*'] to allow all domains.
const myFunctionUrl = lambdaFunction.addFunctionUrl({
authType: lambda.FunctionUrlAuthType.NONE,
cors: {
allowedOrigins: ['*'],
},
})
new cdk.CfnOutput(this, 'LambdaNodeUrl', {
value: myFunctionUrl.url,
})
Define the Lambda Function Code
In the project structure, we’ll create a folder called src/generating-pdf
to store all function code with Node.js runtime.
The entire Lambda function code can be found here. The main code for generating PDF is:
.....
// Launch a headless Chrome browser using puppeteer
const browser = await puppeteer.launch({
args: chromium.args,
defaultViewport: chromium.defaultViewport,
executablePath: await chromium.executablePath(),
headless: chromium.headless
})
// Open a new page in the browser
const page = await browser.newPage()
// Set html content on the new page
await page.setContent(requestBody.html)
// Generate PDF
const buffer = await page.pdf({
format: 'A4',
margin: { bottom: '50px', top: '50px', left: '50px', right: '50px' },
})
// Close the page and browser
await page.close()
await browser.close()
// Upload the PDF to the S3 bucket
await s3Client.send(new PutObjectCommand({
Bucket: bucketName,
Key: `${new Date().getTime()}`,
Body: buffer,
ContentType: 'application/pdf',
}))
.....
A file with these dependencies would be:
"dependencies": {
"@aws-sdk/client-s3": "^3.637.0",
"@aws-sdk/s3-request-presigner": "^3.637.0",
"puppeteer-core": "^23.2.0"
}
Deploy the Lambda Function on AWS
First, run the cdk synth
command, which generates the CloudFormation template from the AWS CDK code. This command will also validate the stack definition and raise an error if there are any issues.
Next, run cdk bootstrap
to create a CloudFormation stack that includes the necessary resources.
Finally, deploy our project with the cdk deploy
command. After running this command, we'll see the output below. Press y
to continue.
After the deployment completes, we'll receive the Lambda function's URL in the output, which you can invoke to generate PDFs.
Test the Lambda Function with the Function URL
To test the function, we'll simply send a POST request to the Lambda function URL with an HTML string, like this:
curl --location 'LAMBDA_FUNCTION_URL' \
--header 'Content-Type: application/json' \
--data '{
"html": "<h1>HELLO WORLD: THIS IS GENERATING PDF LAMBDA</h1>\n<p><img src=\"https://cdn.britannica.com/77/234477-050-DF90E2ED/Doberman-pinscher-dog.jpg\" alt=\"\" width=\"691\" height=\"496\"></p>"
}
'
This should generate a PDF from the provided HTML and return the S3 bucket URL where the PDF was saved.
Conclusion
In this blog post, we walked through deploying a Lambda function with Puppeteer and Chromium for PDF generation and explored setting up AWS resources using AWS CDK. This is a scalable solution to generate PDFs in the cloud.
I hope this post helpful to you and thank you for reading.
Reference
- https://github.com/aws-samples/cdk-typescript-lambda/tree/main
- https://blog.tericcabrel.com/create-lambda-function-with-node-js-and-typescript-on-aws-cdk/
- https://github.com/Sparticuz/chromium
- https://www.pluralsight.com/resources/blog/cloud/serverless-browser-automation-with-aws-lambda-and-puppeteer
Posted on September 4, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.