User Agent string difference in Puppeteer headless and headful
Sony AK
Posted on November 29, 2019
Today I will talk about the User Agent difference when we running Puppeteer in headless and headful mode.
For people not familiar with Puppeteer, Puppeteer is a Node library that provides many high-level API to control the headless Chrome or Chromium over DevTools protocol. You can go to https://pptr.dev/ for more details.
Puppeteer in headless mode means you control Chrome or Chromium browser without displaying the browser UI. In the opposite, Puppeteer in headful mode will display the browser UI and this is useful for debugging.
As mentioned here https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent, User Agent string is a characteristic string that allows the network protocol peers to identify the application type, operating system, software vendor or software version of the requesting software user agent.
Web browser send User-Agent request header when we browse a web pages on the internet. Here is sample of my User Agent.
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36
Preparation
Install Puppeteer with this command.
npm i puppeteer
The code
OK now let's create a code to show User Agent string when running Puppeteer in headless mode.
File puppeteer_headless.js
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
console.log(await browser.userAgent());
await browser.close();
})();
Run it.
node puppeteer_headless.js
On my machine it will display like below.
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/79.0.3945.0 Safari/537.36
Please notice there is sub string HeadlessChrome
there.
OK now let's create a code to show User Agent string when running Puppeteer in headful mode.
File puppeteer_headful.js
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({ headless: false });
console.log(await browser.userAgent());
await browser.close();
})();
Run with
node puppeteer_headful.js
On my machine it will display like below.
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.0 Safari/537.36
Now we can see that this User Agent string is similar like normal web browser User Agent string.
Why this is interesting? Suppose you want to scrap a website using Puppeteer in headless mode and the target website put a protection by detecting the User Agent string (blocking ChromeHeadless) then your scraping activity might be blocked.
How to set User Agent on headless Chrome
Anyway we still can set User Agent string in Puppeteer headless mode, it will override the default headless Chrome User Agent string.
Here is the code sample.
File puppeteer_set_user_agent.js
const puppeteer = require('puppeteer');
(async () => {
// prepare for headless chrome
const browser = await puppeteer.launch();
const page = await browser.newPage();
// set user agent (override the default headless User Agent)
await page.setUserAgent('Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36');
// go to Google home page
await page.goto('https://google.com');
// get the User Agent on the context of Puppeteer
const userAgent = await page.evaluate(() => navigator.userAgent );
// If everything correct then no 'HeadlessChrome' sub string on userAgent
console.log(userAgent);
await browser.close();
})();
It will display User Agent that we already set before we browse to Google web page.
Thank you and I hope you enjoy it.
Posted on November 29, 2019
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.