The Puppeteer Language Experiment

zirkelc

Chris Cook

Posted on May 21, 2024

The Puppeteer Language Experiment

If you're using Puppeteer for automation, tests, or web scraping, you've likely encountered the question of how to set the browser's language. Controlling the language explicitly is crucial because the language on your local system might differ from that on another remote system where you want to run Puppeteer, such as CI (e.g., GitHub Actions) or a Serverless environment (e.g., AWS Lambda or CloudFlare Workers).

To my surprise, it’s not well documented how to do that. It seems there are a bunch of options, and you have to figure out what works best for you. After spending a considerable amount of time on research, I’ve compiled a list of options. I have validated each option against BrowserLeaks, which shows you all available information on your browser and its supported features, including the locale and the accepted language, which ultimately determines the content you’re going to see.

What Language?

There are two ways to determine the requester's language: client-side and server-side. On the client side, with JavaScript, you have the navigator.language and navigator.languages properties, and the Intl.DateTimeFormat().resolvedOptions() method. The values do not necessarily match each other because navigator.language seems to use the language from Chrome settings, while Intl.DateTimeFormat().resolvedOptions() reflects the operating system's language. On the server side, you have the Accept-Language header that is sent with every HTTP request from the browser. The browser uses the navigator.languages property to fill the header values.

Side note: I use the term language to refer to the first part de of a locale like de-DE.

It must be noted that the server can decide to return the content of a website in the language reflecting the HTTP Accept-Language header values, or it can use JavaScript to detect the language and redirect the user to a localized URL (think of example.com/de-DE/), or it could dynamically load the actual content from the server via JavaScript based on the properties.

Set Up Puppeteer

I'm using the puppeteer package, which automatically downloads a recent version of Chrome for Testing (I think it used to be Chromium). There is also puppeteer-core if you want to manage the browser installation yourself or if the environment already provides a browser.

To start the browser, all we need to do is call the launch function without any further arguments needed. Puppeteer provides all defaults.

import puppeteer from "puppeteer";

const LANG = 'de-DE';

const browser = await puppeteer.launch();
const page = await browser.newPage();
Enter fullscreen mode Exit fullscreen mode

I defined the constant LANG as the target language for the browser. In the next few steps, I'll show you how to apply this setting.

Command line argument --lang

Chrome (and Chromium) provides a plethora of command line arguments. The argument --lang can be used to set the language on startup. Keep in mind it must be merged with the default args from Puppeteer.

console.log(`Using --lang=${LANG}`);

const args = [...puppeteer.defaultArgs(), `--lang=${LANG}`];
const browser = await puppeteer.launch({
  args,
});

const page = await browser.newPage();
Enter fullscreen mode Exit fullscreen mode

Environment variable LANG

While the environment variable is not officially documented, there are references to it on the Chromium bug tracker and the Puppeteer repository. However, this seems to work only on Linux. Even worse, setting this environment variable causes the browser to fail to start on certain operating systems.

console.log(`Using env.LANG=${LANG}`);

const browser = await puppeteer.launch({
  env: {
    LANG,
  }
});

const page = await browser.newPage();
Enter fullscreen mode Exit fullscreen mode

HTTP header Accept-Language

Additional HTTP headers like Accept-Language can be set on every page request. This doesn't affect the browser language itself, but the requested website may return the content in the requested language if it respects this header.

console.log(`Sending HTTP header Accept-Language: ${LANG}`);

const browser = await puppeteer.launch();

const page = await browser.newPage();
await page.setExtraHTTPHeaders({
  'Accept-Language': LANG,
});
Enter fullscreen mode Exit fullscreen mode

Override navigator.language

Overriding the navigator.language property on a page might seem like a hack, but it is actually an official example in the Puppeteer docs.

console.log(`Overriding navigator.language=${LANG}`);

const browser = await puppeteer.launch();

const page = await browser.newPage();
await page.evaluateOnNewDocument((lang) => {
  Object.defineProperty(navigator, 'language', {
    get() {
      return lang;
    },
  });
  Object.defineProperty(navigator, 'languages', {
    get() {
      return [lang];
    },
  });
}, LANG);
Enter fullscreen mode Exit fullscreen mode

Chrome DevTools Protocol Network.setUserAgentOverride

Puppeteer uses the Chrome DevTools Protocol (CDP) to communicate with Chrome. The Network.setUserAgentOverride method allows setting acceptLanguage as the browser language to emulate. Note that userAgent is a required parameter, but we keep the default value.

console.log(`Using CDP Network.setUserAgentOverride(acceptLanguage: ${LANG})`);

const browser = await puppeteer.launch();

const page = await browser.newPage();

const cdpSession = await page.createCDPSession();
cdpSession.send('Network.setUserAgentOverride', {
  userAgent: await browser.userAgent(),
  acceptLanguage: LANG,
});
Enter fullscreen mode Exit fullscreen mode

Results

I ran all options sequentially and extracted the values for navigator.language, navigator.languages, Intl.DateTimeFormat().resolvedOptions() from BrowserLeaks JS and the Accept-Language header from BrowserLeaks IP. Here are the results for each option:

System:  darwin/23.1.0, arm64
Browser: Chrome/125.0.6422.60

Using --lang=de-DE
        Internationalization Locale: en-GB
        Navigator Language: en-GB
        HTTP Accept-Language: en-GB,en-US;q=0.9,en;q=0.8

Using env.LANG=de-DE
        Internationalization Locale: en-GB
        Navigator Language: en-GB
        HTTP Accept-Language: en-GB,en-US;q=0.9,en;q=0.8

Sending HTTP header Accept-Language: de-DE
        Internationalization Locale: en-GB
        Navigator Language: en-GB
        HTTP Accept-Language: de-DE

Overriding navigator.language=de-DE
        Internationalization Locale: en-GB
        Navigator Language: de-DE
        HTTP Accept-Language: en-GB,en-US;q=0.9,en;q=0.8

Using CDP Network.setUserAgentOverride(acceptLanguage: de-DE)
        Internationalization Locale: en-GB
        Navigator Language: de-DE
        HTTP Accept-Language: de-DE
Enter fullscreen mode Exit fullscreen mode

As you can see, the results vary significantly. None of the options impacted the locale returned from Intl.DateTimeFormat().resolvedOptions(). Surprisingly, even the --lang option didn't affect the browser language as reflected by navigator.language. This is unexpected, as I believe in older versions of Chrome/Chromium or on different operating systems, this flag would be respected. Using the Network.setUserAgentOverride command from CDP appears to be the most reliable way to control the browser language, despite being the least documented option.

Contribution

I implemented this project as a reproducible test suite that one can run locally with minimal setup. If you're interested, please check the repository and run the tests on your local machine. I'm curious to see if the results differ across various operating systems or browser versions.

Puppeteer Language Experiment

This project tests how the browser language can be changed with Puppeteer. It implements multiple options to set the language of Chrome and checks each option against BrowserLeaks to see how it affected the JavaScript proeprties and HTTP headers available by the browser. For more information, see my article The Puppeteer Language Experiment on DEV.to.

Usage

Clone this repository to your locale machine and install the dependencies with npm. The puppeteer package automatically downloads Chrome to a temporary folder.

npm install
Enter fullscreen mode Exit fullscreen mode

Then start the test:

npm test
Enter fullscreen mode Exit fullscreen mode

The test will run each option and print its result:

System:   darwin/23.1.0, arm64
Language: en-US
Browser:  Chrome/125.0.6422.60
Using --lang=de-DE
        Internationalization Locale: en-GB
        Navigator Language: en-GB
        HTTP Accept-Language: en-GB,en-US;q=0.9,en;q=0.8

Using env.LANG=de-DE
        Internationalization Locale: en-GB
        Navigator Language: en-GB
        HTTP Accept-Language: en-GB,en-US;q=0.9,en;q=0.8

Sending HTTP header Accept-Language: de-DE
        Internationalization Locale: en-GB
        Navigator Language: en-GB
        HTTP Accept-Language: de-DE

Overriding navigator.language=de-DE
        Internationalization Locale: en-GB
        Navigator
Enter fullscreen mode Exit fullscreen mode
💖 💪 🙅 🚩
zirkelc
Chris Cook

Posted on May 21, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related