Piloting Puppeteer with PureScript - Part 1
Mike Solomon
Posted on February 5, 2021
tl;dr Here's the GitHub repo showing all this in action.
Functional languages are not often the off-the-shelf choice when working with I/O intensive asynchronous tasks like piloting a headless browser. I find, though, that this is a place where functional programming shines. In addition to helping guarantee the correctness of the code (ie no pesky null
-s or undefined
-s), it provides a step-by-step framework that helps reason about what is going on.
In this series of articles, I'd like to show you how you can pilot Puppeteer on AWS Lambda using PureScript. I hope that, by the end, you'll see how functional programming can be a good fit for these sorts of tasks.
Comparing Puppeteer JS to Puppeteer PureScript
Below is a snippet of how to use the chrome puppeteer plugin copied from the README and edited a bit for clarity.
const chromium = require('chrome-aws-lambda');
exports.handler = async (event, context, callback) => {
let result = null;
let browser = null;
try {
executablePath = await chromium.executablePath;
browser = await launchBrowser(executablePath);
let page = await browser.newPage();
await page.goto(event.url || 'https://example.com');
result = await page.title();
} catch (error) {
return callback(error);
} finally {
if (browser !== null) {
await browser.close();
}
}
return callback(null, result);
};
Compare that to the PureScript version.
handler ::
Foreign ->
Foreign ->
LambdaCallback ->
Effect Unit
handler event context callback =
launchAff_
$ bracket
(executablePath >>= launchBrowser)
close
( \browser -> do
page <- newPage browser
goto page "https://example.com"
title page
)
>>= liftEffect
<<< resolveCallback callback
Comparing the two, we can see that there's not much difference between the basic flow.
- An instance of a browser is created.
- A new page is created.
- The page navigates to example.com.
- The lambda returns the title.
One immediate benefit of the PureScript compared to vanilla JS is the type safety: if you write goto page 42
, the program won't compile. This it the case in TypeScript, Elm, and Reason as well. Using strongly-typed languages helps prevent bugs where you accidentally pass an invalid value and have to sort through error logs later down when headless chrome can't navigate to 42
and crashes with error code 127
.
Aff
An additional benefit of PureScript, and the main focus of this article, is the Aff
monad. Aff
-s are asynchronous, fiber-based computations in a monadic context. This endows them with several superpowers, like the ability to be forked, joined, spawned, all of which is clunky in JS/TS.
Aff
-s can also be used to reason about how resources are used - how they're allocated, how they're released, and what they're used to make. This is done with the function bracket
. Let's take a look at its signature:
bracket :: Aff a -> (a -> Aff Unit) -> (a -> Aff b) -> Aff b
bracket acquire release use = ...
acquire
is where you create a resource, release
is where you clean it up irrespective of what happens when it's used, and use
is where a
is used to create a b
. This is a bit like try/catch/finally
, but it has several advantages:
- It forces us to write cleanup code in
finally
. - It distinguishes between failure in the
use
stage and failure in theacquire
stage, whereastry
clumps these two together. - It always returns an
Aff
of typeb
, which makes it easier to do the next step as a continuation - in this case, the lambda callback. Compare this to the JavaScript, where the only way to getresult
tocallback
is by makingresult
mutable, which is an invitation for disaster.
I find that the last point is the most important one. When I write lambda is JS or TS, it's hard to remember to call the callback and often requires passing the callback around to lots of internal functions. Here, by using Aff
, the callback is always the last thing called and it is called with an immutable result (here, the outcome of bracket
).
Given all the stuff that can go wrong when running a headless browser on a serverless function executing on bare metal somewhere in Ireland, it's nice to know that the orchestration of acquiring and releasing assets in an asynchronous context is predictable thanks to a rock-solid type system. And not just nice for us - it's nice for our users as well! This helps guarantee that Meeshkan users have smooth tests and videos on the Meeshkan service, both of which are produced on headless Chrome on AWS Lambda.
In the next article, we'll look at how to use type classes in PureScript to enforce consistent patterns in the writing of asynchronous code.
Posted on February 5, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.