If you're using Puppeteer for automation, tests, or web scraping, you've likely encountered the question of how to set the browser's language. Controlling the language explicitly is crucial because the language on your local system might differ from that on another remote system where you want to run Puppeteer, such as CI (e.g., GitHub Actions) or a Serverless environment (e.g., AWS Lambda or CloudFlare Workers).
To my surprise, it’s not well documented how to do that. It seems there are a bunch of options, and you have to figure out what works best for you. After spending a considerable amount of time on research, I’ve compiled a list of options. I have validated each option against BrowserLeaks, which shows you all available information on your browser and its supported features, including the locale and the accepted language, which ultimately determines the content you’re going to see.
What Language?
There are two ways to determine the requester's language: client-side and server-side. On the client side, with JavaScript, you have the navigator.language and navigator.languages properties, and the Intl.DateTimeFormat().resolvedOptions() method. The values do not necessarily match each other because navigator.language seems to use the language from Chrome settings, while Intl.DateTimeFormat().resolvedOptions() reflects the operating system's language. On the server side, you have the Accept-Language header that is sent with every HTTP request from the browser. The browser uses the navigator.languages property to fill the header values.
Side note: I use the term language to refer to the first part de of a locale like de-DE.
It must be noted that the server can decide to return the content of a website in the language reflecting the HTTP Accept-Language header values, or it can use JavaScript to detect the language and redirect the user to a localized URL (think of example.com/de-DE/), or it could dynamically load the actual content from the server via JavaScript based on the properties.
Set Up Puppeteer
I'm using the puppeteer package, which automatically downloads a recent version of Chrome for Testing (I think it used to be Chromium). There is also puppeteer-core if you want to manage the browser installation yourself or if the environment already provides a browser.
To start the browser, all we need to do is call the launch function without any further arguments needed. Puppeteer provides all defaults.
I defined the constant LANG as the target language for the browser. In the next few steps, I'll show you how to apply this setting.
Command line argument --lang
Chrome (and Chromium) provides a plethora of command line arguments. The argument --lang can be used to set the language on startup. Keep in mind it must be merged with the default args from Puppeteer.
While the environment variable is not officially documented, there are references to it on the Chromium bug tracker and the Puppeteer repository. However, this seems to work only on Linux. Even worse, setting this environment variable causes the browser to fail to start on certain operating systems.
Additional HTTP headers like Accept-Language can be set on every page request. This doesn't affect the browser language itself, but the requested website may return the content in the requested language if it respects this header.
Puppeteer uses the Chrome DevTools Protocol (CDP) to communicate with Chrome. The Network.setUserAgentOverride method allows setting acceptLanguage as the browser language to emulate. Note that userAgent is a required parameter, but we keep the default value.
I ran all options sequentially and extracted the values for navigator.language, navigator.languages, Intl.DateTimeFormat().resolvedOptions() from BrowserLeaks JS and the Accept-Language header from BrowserLeaks IP. Here are the results for each option:
As you can see, the results vary significantly. None of the options impacted the locale returned from Intl.DateTimeFormat().resolvedOptions(). Surprisingly, even the --lang option didn't affect the browser language as reflected by navigator.language. This is unexpected, as I believe in older versions of Chrome/Chromium or on different operating systems, this flag would be respected. Using the Network.setUserAgentOverride command from CDP appears to be the most reliable way to control the browser language, despite being the least documented option.
Contribution
I implemented this project as a reproducible test suite that one can run locally with minimal setup. If you're interested, please check the repository and run the tests on your local machine. I'm curious to see if the results differ across various operating systems or browser versions.
This project tests how the browser language can be changed with Puppeteer. It implements multiple options to set the language of Chrome and checks each option against BrowserLeaks to see how it affected the JavaScript proeprties and HTTP headers available by the browser. For more information, see my article The Puppeteer Language Experiment on DEV.to.
Usage
Clone this repository to your locale machine and install the dependencies with npm. The puppeteer package automatically downloads Chrome to a temporary folder.
npm install
Then start the test:
npm test
The test will run each option and print its result: