Megan Lee
Posted on November 21, 2024
Written by Antonello Zanini✏️
One of the biggest challenges for Node.js automation scripts is getting blocked by anti-bot measures. No site likes bots, so many web servers implement bot protection solutions to stop them.
The key to identifying bots lies in examining low-level network details, as most HTTP clients do not use the same underlying connection libraries as browsers. This is where [curl-impersonate](https://github.com/lwthiker/curl-impersonate)
comes in.
As a customized build of curl
, it adopts the same low-level network libraries as popular browsers, making its requests nearly identical to those of legitimate users.
In this article, you will learn what curl-impersonate
is and how to use it in Node.js for bot detection bypass in web scraping and automation scripts. If you want to read more about how curl
normally works before jumping in to this piece, take a gander at our introduction guide.
How to use curl-impersonate
in Node.js
curl-impersonate
is a specialized build of curl
that can impersonate real-world browsers. Unlike standard curl
, it adjusts request headers, TLS fingerprints, and other parameters to make its requests closely resemble those from browsers like Chrome, Firefox, and Safari.
By doing so, curl-impersonate
helps to fool anti-bot mechanisms into thinking that your automated request is coming from a normal browser instead of an HTTP client. This makes the project useful for scenarios like web scraping or any situation where a site might otherwise restrict or block access from automated tools.
curl-impersonate
is available through Docker images so that you can use it as a command in your terminal at the OS level. Additionally, the project provides the libcurl-impersonate
library which opens the door to specific bindings in multiple programming languages, including Node.js. Let’s now see how to use curl-impersonate
in a Node.js script!
Install curl-impersonate
The npm registry lists a few Node.js bindings for the curl-impersonate
project:
While none of these options clearly stands out from the others, [node-curl-impersonate](https://www.npmjs.com/package/node-curl-impersonate)
is one of the most reliable choices. It is written in TypeScript, actively maintained, receives frequent updates, and has been under continuous development for over a year.
Add node-curl-impersonate
to your project’s dependencies with the following command:
npm install node-curl-impersonate
Note: node-curl-impersonate
is only compatible with Unix-based operating systems like Linux and macOS. If you are on Windows and cannot use the WSL (Windows Subsystem for Linux), consider using [ts-curl-impersonate](https://www.npmjs.com/package/ts-curl-impersonate)
as an alternative as it comes with native Windows support.
Configure and use the client
First, import node-curl-impersonate
in your JavaScript or TypeScript script:
import CurlImpersonate from "node-curl-impersonate";
Keep in mind that node-curl-impersonate
is an ES module, so you cannot import it with a require()
like a CommonJS package. If you do not know what that means, read our article on CommonJS vs. ES modules in Node.js.
CurlImpersonate
is a constructor you can use to initialize a curl-impersonate
request, as in the example below:
const curlImpersonate = new CurlImpersonate("https://example.com", {
method: "GET",
impersonate: "chrome-116",
headers: {},
});
The constructor takes a URL and an optional options object. Here is a breakdown of the available options:
-
method
— The HTTP method to use for the request. Currently, only"GET"
and"POST"
are supported -
impersonate
— A string identifying the browser to impersonate. The supported options are"chrome-110"
,"chrome-116"
,"firefox-109"
, and"firefox-117"
-
headers
— A key-value object containing custom HTTP headers to merge with the headers set automatically bycurl-impersonate
. Note that this is not optional -
body
— An optional object used as a JSON body for a POST request. -
verbose
— An optional boolean flag to enable verbose mode, which logs what the client does behind the scenes -
flags
— An optional array of additional flags to pass to the underlyinglibcurl-impersonate
library
To make the request, call makeRequest()
on the returned instance:
await curlImpersonate.makeRequest();
Alternatively, you can create the instance without a URL and pass it later to makeRequest()
:
const curlImpersonate = new CurlImpersonate(undefined, {
method: "GET",
impersonate: "chrome-116",
headers: {},
});
curlImpersonate.makeRequest("https://example.com")
// ...
// curlImpersonate.makeRequest(...)
This allows you to reuse the same CurlImpersonate
instance for multiple requests, especially for GET requests, as POST requests usually require a body, which can only be set in the constructor.
Do not forget that node-curl-impersonate
only works with Unix-based systems. Attempting to use it on Windows will result in the following error:
Error: Unsupported Platform! win32
If you are a Windows user, you can bypass that issue by using the WSL.
Perform a request against an anti-bot-protected site
Kick is a popular streaming service, especially among younger audiences, and its popularity is growing quickly. If you try to perform web scraping on Kick, you are likely to encounter the following anti-bot detection page that blocks automated requests:
With node-curl-impersonate
, you can bypass Kick's anti-bot measures and access the site's HTML content. Here is how you can do it:
import CurlImpersonate from "node-curl-impersonate";
(async () => {
// initialize a curl-impersonate request with the specified options
const curlImpersonate = new CurlImpersonate("https://kick.com/", {
method: "GET",
impersonate: "chrome-116",
headers: {},
});
// perform the request
const curlResponse = await curlImpersonate.makeRequest();
// extract the response data
const response = curlResponse.response;
const responseStatusCode = curlResponse.statusCode;
// if the server responded with a 4xx or 5xx error
if (responseStatusCode && ["4", "5"].includes(responseStatusCode.toString()[0])) {
// error handling logic...
console.error("Error response:", response);
} else {
// handle the response...
console.log(response);
}
})();
If you launch the above script, the output will be the HTML content of Kick's home page:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charSet="utf-8" />
<meta
name="viewport"
content="width=device-width, initial-scale=1, maximum-scale=1, user-scalable=no"
/>
<link rel="preload" as="image" href="/img/kick-logo.svg" />
<!-- omitted for brevity... -->
<title>Kick</title>
<meta
name="description"
content="Kick is a streaming platform that makes it easy for you to find and watch your favorite content."
/>
<!-- omitted for brevity... -->
</head>
</html>
Awesome, the result confirms that you were able to access the target page without being blocked!
Setting browser-like HTTP headers is not enough to avoid blocks
curl-impersonate
is certainly an interesting technology, but you may wonder what makes it so powerful and unique.
The common assumption when it comes to fooling anti-bot systems is that all you need to do is replicate browser requests. That is not entirely wrong, but it is far from easy to accomplish. Let's see why!
Open your browser in incognito mode and visit the Kick home page—the target web page of this article. In the “Network” tab of DevTools, you will see the request that the browser makes:
Notice how Chrome includes special HTTP headers in the request. Apparently, that is the only difference from a request made with a regular HTTP client.
Right-click on the request and select the Copy > Copy as fetch (Node.js) option. This is what you would get:
fetch("https://kick.com/", {
"headers": {
"sec-ch-ua": ""Chromium";v="130", "Google Chrome";v="130", "Not?A_Brand";v="99"",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": ""Windows"",
"upgrade-insecure-requests": "1"
},
"referrerPolicy": "strict-origin-when-cross-origin",
"body": null,
"method": "GET"
});
fetch()
is a function that comes from the Node.js Fetch API. See why the above code does not require an external library in our piece on the Fetch API in Node.js.
Copy the request to a JavaScript script and execute it. You will get the following [403 Forbidden](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403)
page:
<!DOCTYPE html>
<html lang="en-US">
<head>
<title>Just a moment...</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=Edge">
<meta name="robots" content="noindex,nofollow">
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- omitted for brevity... -->
</head>
</html>
In this case, Kick was able to detect your request as coming from an automated script and block it. How is that even possible? Read on!
Why curl-impersonate
is effective against most anti-bot solutions
What we did above is mimic the behavior of a browser at the application layer, making an equivalent HTTPS request to that of your browser. But remember, the Internet operates over a stack of layers!
To reach the server, your HTTPS request must pass through the TLS channel created at the transport layer, then through the IP layer, and so on.
As a web developer, you spend most of your day working technologies at the application layer. However, it is essential not to overlook the underlying layers that enable the application layer to function.
Anti-bot solutions analyze all aspects of incoming requests, from high-level application details down to lower-level elements. To determine if a request is genuine, they cannot rely solely on application-layer details at the HTTPS level. Otherwise, eluding bot detection would be a piece of cake!
So, the most advanced bot protection systems on the market like Cloudflare focus on low-level network aspects, such as the TLS fingerprint of the request.
TLS fingerprinting as a key to discovering bots
When a client like your browser or scraping bot initiates a secure connection with a server, that requires a TLS handshake.
During that process, the client and server negotiate encryption settings. This handshake includes details like the TLS version, cipher suites, and extensions that the client supports.
Based on the information exchanged during the handshake, it is possible to generate a "fingerprint" that helps distinguish from one client to another.
This is how most bot detection systems can tell if you are using a real browser or not. Browsers use well-known TLS libraries that are generally different from those used by HTTP clients.
The consequence of this is that the TLS fingerprint of a request made by a browser is quite different from that of an HTTP client — even if they share the same HTTP headers.
You can verify that by targeting the Scrapingly TLS Fingerprinting API in your browser and comparing the result with clients like node-curl-impersonate
and the Fetch API.
Chrome 130 returns:
{
"ja3": "772,4865-4866-4867-49195-49199-49196-49200-52393-52392-49171-49172-156-157-47-53,35-27-43-11-16-65281-10-13-65037-5-18-23-45-0-17513-51,25497-29-23-24,0",
"ja3n": "772,4865-4866-4867-49195-49199-49196-49200-52393-52392-49171-49172-156-157-47-53,0-5-10-11-13-16-18-23-27-35-43-45-51-17513-65037-65281,25497-29-23-24,0",
"ja3_digest": "370fa7191028e260eac290c51745d8f8",
"ja3n_digest": "eb5a4e1d21094c5caf044c8f3117f306",
"scrapfly_fp": "version:772|ch_ciphers:GREASE-4865-4866-4867-49195-49199-49196-49200-52393-52392-49171-49172-156-157-47-53|ch_extensions:GREASE-0-5-10-11-13-16-18-23-27-35-43-45-51-17513-65037-65281-GREASE|groups:GREASE-25497-29-23-24|points:0|compression:0|supported_versions:GREASE-772-771|supported_protocols:h2-http11|key_shares:GREASE-25497-29|psk:1|signature_algs:1027-2052-1025-1283-2053-1281-2054-1537|early_data:0|",
"scrapfly_fp_digest": "58e05a62bade1452454ea0b0cc49c971",
"tls": {
"version": "0x0303 - TLS 1.2",
"ciphers": [
"0x3A3A",
"TLS_AES_128_GCM_SHA256",
"TLS_AES_256_GCM_SHA384",
"TLS_CHACHA20_POLY1305_SHA256",
"TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256",
"TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256",
"TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384",
"TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384",
"TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256",
"TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256",
"TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA",
"TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA",
"TLS_RSA_WITH_AES_128_GCM_SHA256",
"TLS_RSA_WITH_AES_256_GCM_SHA384",
"TLS_RSA_WITH_AES_128_CBC_SHA",
"TLS_RSA_WITH_AES_256_CBC_SHA"
],
"curves": [
"TLS_GREASE (0x1A1A)",
"Unknown curve 0x6399",
"X25519 (29)",
"secp256r1 (23)",
"secp384r1 (24)"
],
"extensions": [
"GREASE (0x4A4A)",
"session_ticket (35) (IANA)",
"compress_certificate (27) (IANA)",
"supported_versions (43) (IANA)",
"ec_point_formats (11) (IANA)",
"application_layer_protocol_negotiation (16) (IANA)",
"extensionRenegotiationInfo (boringssl) (65281) (IANA)",
"supported_groups (10) (IANA)",
"signature_algorithms (13) (IANA)",
"extensionEncryptedClientHello (65037) (boringssl)",
"status_request (5) (IANA)",
"signed_certificate_timestamp (18) (IANA)",
"extended_master_secret (23) (IANA)",
"psk_key_exchange_modes (45) (IANA)",
"server_name (0) (IANA)",
"extensionApplicationSettings (17513) (boringssl)",
"key_share (51) (IANA)",
"GREASE (0x8A8A)"
],
"points": [
"0x00"
],
"protocols": [
"h2",
"http/1.1"
],
"versions": [
"43690",
"772",
"771"
],
"handshake_duration": "184.049664ms",
"is_session_resumption": false,
"session_ticket_supported": true,
"support_secure_renegotiation": true,
"supported_tls_versions": [
43690,
772,
771
],
"supported_protocols": [
"h2",
"http11"
],
"signature_algorithms": [
1027,
2052,
1025,
1283,
2053,
1281,
2054,
1537
],
"psk_key_exchange_mode": "AQ==",
"cert_compression_algorithms": "AA==",
"early_data": false,
"using_psk": false,
"selected_protocol": "h2",
"selected_curve_group": 29,
"selected_cipher_suite": 4865,
"key_shares": [
6682,
25497,
29
]
}
}
node-curl-impersonate
returns:
{
"ja3": "772,4865-4866-4867-49195-49199-49196-49200-52393-52392-49171-49172-156-157-47-53,35-43-65281-45-51-5-16-0-27-13-23-11-10-17513-18,29-23-24,0",
"ja3n": "772,4865-4866-4867-49195-49199-49196-49200-52393-52392-49171-49172-156-157-47-53,0-5-10-11-13-16-18-23-27-35-43-45-51-17513-65281,29-23-24,0",
"ja3_digest": "d737eab1c0aba59b4b466cf91d42a47a",
"ja3n_digest": "0fb2c926015957b7e56038e269a7c58a",
"scrapfly_fp": "version:772|ch_ciphers:GREASE-4865-4866-4867-49195-49199-49196-49200-52393-52392-49171-49172-156-157-47-53|ch_extensions:GREASE-0-5-10-11-13-16-18-23-27-35-43-45-51-17513-65281-GREASE|groups:GREASE-29-23-24|points:0|compression:0|supported_versions:GREASE-772-771|supported_protocols:h2-http11|key_shares:GREASE-29|psk:1|signature_algs:1027-2052-1025-1283-2053-1281-2054-1537|early_data:0|",
"scrapfly_fp_digest": "81fbc443bb8cb67310e62d982c1e4c98",
"tls": {
"version": "0x0303 - TLS 1.2",
"ciphers": [
"0x6A6A",
"TLS_AES_128_GCM_SHA256",
"TLS_AES_256_GCM_SHA384",
"TLS_CHACHA20_POLY1305_SHA256",
"TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256",
"TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256",
"TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384",
"TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384",
"TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256",
"TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256",
"TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA",
"TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA",
"TLS_RSA_WITH_AES_128_GCM_SHA256",
"TLS_RSA_WITH_AES_256_GCM_SHA384",
"TLS_RSA_WITH_AES_128_CBC_SHA",
"TLS_RSA_WITH_AES_256_CBC_SHA"
],
"curves": [
"TLS_GREASE (0xBABA)",
"X25519 (29)",
"secp256r1 (23)",
"secp384r1 (24)"
],
"extensions": [
"GREASE (0x0A0A)",
"session_ticket (35) (IANA)",
"supported_versions (43) (IANA)",
"extensionRenegotiationInfo (boringssl) (65281) (IANA)",
"psk_key_exchange_modes (45) (IANA)",
"key_share (51) (IANA)",
"status_request (5) (IANA)",
"application_layer_protocol_negotiation (16) (IANA)",
"server_name (0) (IANA)",
"compress_certificate (27) (IANA)",
"signature_algorithms (13) (IANA)",
"extended_master_secret (23) (IANA)",
"ec_point_formats (11) (IANA)",
"supported_groups (10) (IANA)",
"extensionApplicationSettings (17513) (boringssl)",
"signed_certificate_timestamp (18) (IANA)",
"GREASE (0x5A5A)",
"padding (21) (IANA)"
],
"points": [
"0x00"
],
"protocols": [
"h2",
"http/1.1"
],
"versions": [
"23130",
"772",
"771"
],
"handshake_duration": "221.314783ms",
"is_session_resumption": false,
"session_ticket_supported": true,
"support_secure_renegotiation": true,
"supported_tls_versions": [
23130,
772,
771
],
"supported_protocols": [
"h2",
"http11"
],
"signature_algorithms": [
1027,
2052,
1025,
1283,
2053,
1281,
2054,
1537
],
"psk_key_exchange_mode": "AQ==",
"cert_compression_algorithms": "AA==",
"early_data": false,
"using_psk": false,
"selected_protocol": "h2",
"selected_curve_group": 29,
"selected_cipher_suite": 4865,
"key_shares": [
47802,
29
]
}
}
fetc()
returns:
{
"ja3": "772,4866-4867-4865-49199-49195-49200-49196-158-49191-103-49192-107-163-159-52393-52392-52394-49327-49325-49315-49311-49245-49249-49239-49235-162-49326-49324-49314-49310-49244-49248-49238-49234-49188-106-49187-64-49162-49172-57-56-49161-49171-51-50-157-49313-49309-49233-156-49312-49308-49232-61-60-53-47-255,0-11-10-35-16-22-23-13-43-45-51,29-23-30-25-24-256-257-258-259-260,0-1-2",
"ja3n": "772,4866-4867-4865-49199-49195-49200-49196-158-49191-103-49192-107-163-159-52393-52392-52394-49327-49325-49315-49311-49245-49249-49239-49235-162-49326-49324-49314-49310-49244-49248-49238-49234-49188-106-49187-64-49162-49172-57-56-49161-49171-51-50-157-49313-49309-49233-156-49312-49308-49232-61-60-53-47-255,0-10-11-13-16-22-23-35-43-45-51,29-23-30-25-24-256-257-258-259-260,0-1-2",
"ja3_digest": "f376ddf05a7a38d2fb080069329ce2a2",
"ja3n_digest": "7b70814919c3f12abb0b7d0b603462aa",
"scrapfly_fp": "version:772|ch_ciphers:4866-4867-4865-49199-49195-49200-49196-158-49191-103-49192-107-163-159-52393-52392-52394-49327-49325-49315-49311-49245-49249-49239-49235-162-49326-49324-49314-49310-49244-49248-49238-49234-49188-106-49187-64-49162-49172-57-56-49161-49171-51-50-157-49313-49309-49233-156-49312-49308-49232-61-60-53-47-255|ch_extensions:0-10-11-13-16-22-23-35-43-45-51|groups:29-23-30-25-24-256-257-258-259-260|points:0-1-2|compression:0|supported_versions:772-771|supported_protocols:http11|key_shares:29|psk:1|signature_algs:1027-1283-1539-2055-2056-2057-2058-2059-2052-2053-2054-1025-1281-1537-771-769-770-1026-1282-1538|early_data:0|",
"scrapfly_fp_digest": "8b2bf560717049d7bb701693d9f0d90b",
"tls": {
"version": "0x0303 - TLS 1.2",
"ciphers": [
"TLS_AES_256_GCM_SHA384",
"TLS_CHACHA20_POLY1305_SHA256",
"TLS_AES_128_GCM_SHA256",
"TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256",
"TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256",
"TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384",
"TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384",
"TLS_DHE_RSA_WITH_AES_128_GCM_SHA256",
"TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256",
"TLS_DHE_RSA_WITH_AES_128_CBC_SHA256",
"TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384",
"TLS_DHE_RSA_WITH_AES_256_CBC_SHA256",
"TLS_DHE_DSS_WITH_AES_256_GCM_SHA384",
"TLS_DHE_RSA_WITH_AES_256_GCM_SHA384",
"TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256",
"TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256",
"TLS_DHE_RSA_WITH_CHACHA20_POLY1305_SHA256",
"TLS_ECDHE_ECDSA_WITH_AES_256_CCM_8",
"TLS_ECDHE_ECDSA_WITH_AES_256_CCM",
"TLS_DHE_RSA_WITH_AES_256_CCM_8",
"TLS_DHE_RSA_WITH_AES_256_CCM",
"TLS_ECDHE_ECDSA_WITH_ARIA_256_GCM_SHA384",
"TLS_ECDHE_RSA_WITH_ARIA_256_GCM_SHA384",
"TLS_DHE_DSS_WITH_ARIA_256_GCM_SHA384",
"TLS_DHE_RSA_WITH_ARIA_256_GCM_SHA384",
"TLS_DHE_DSS_WITH_AES_128_GCM_SHA256",
"TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8",
"TLS_ECDHE_ECDSA_WITH_AES_128_CCM",
"TLS_DHE_RSA_WITH_AES_128_CCM_8",
"TLS_DHE_RSA_WITH_AES_128_CCM",
"TLS_ECDHE_ECDSA_WITH_ARIA_128_GCM_SHA256",
"TLS_ECDHE_RSA_WITH_ARIA_128_GCM_SHA256",
"TLS_DHE_DSS_WITH_ARIA_128_GCM_SHA256",
"TLS_DHE_RSA_WITH_ARIA_128_GCM_SHA256",
"TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384",
"TLS_DHE_DSS_WITH_AES_256_CBC_SHA256",
"TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256",
"TLS_DHE_DSS_WITH_AES_128_CBC_SHA256",
"TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA",
"TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA",
"TLS_DHE_RSA_WITH_AES_256_CBC_SHA",
"TLS_DHE_DSS_WITH_AES_256_CBC_SHA",
"TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA",
"TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA",
"TLS_DHE_RSA_WITH_AES_128_CBC_SHA",
"TLS_DHE_DSS_WITH_AES_128_CBC_SHA",
"TLS_RSA_WITH_AES_256_GCM_SHA384",
"TLS_RSA_WITH_AES_256_CCM_8",
"TLS_RSA_WITH_AES_256_CCM",
"TLS_RSA_WITH_ARIA_256_GCM_SHA384",
"TLS_RSA_WITH_AES_128_GCM_SHA256",
"TLS_RSA_WITH_AES_128_CCM_8",
"TLS_RSA_WITH_AES_128_CCM",
"TLS_RSA_WITH_ARIA_128_GCM_SHA256",
"TLS_RSA_WITH_AES_256_CBC_SHA256",
"TLS_RSA_WITH_AES_128_CBC_SHA256",
"TLS_RSA_WITH_AES_256_CBC_SHA",
"TLS_RSA_WITH_AES_128_CBC_SHA",
"TLS_EMPTY_RENEGOTIATION_INFO"
],
"curves": [
"X25519 (29)",
"secp256r1 (23)",
"X448 (30)",
"secp521r1 (25)",
"secp384r1 (24)",
"ffdhe2048 (256)",
"ffdhe3072 (257)",
"ffdhe4096 (258)",
"ffdhe6144 (259)",
"ffdhe8192 (260)"
],
"extensions": [
"server_name (0) (IANA)",
"ec_point_formats (11) (IANA)",
"supported_groups (10) (IANA)",
"session_ticket (35) (IANA)",
"application_layer_protocol_negotiation (16) (IANA)",
"encrypt_then_mac (22) (IANA)",
"extended_master_secret (23) (IANA)",
"signature_algorithms (13) (IANA)",
"supported_versions (43) (IANA)",
"psk_key_exchange_modes (45) (IANA)",
"key_share (51) (IANA)"
],
"points": [
"0x00",
"0x01",
"0x02"
],
"protocols": [
"http/1.1"
],
"versions": [
"772",
"771"
],
"handshake_duration": "195.733862ms",
"is_session_resumption": false,
"session_ticket_supported": true,
"support_secure_renegotiation": true,
"supported_tls_versions": [
772,
771
],
"supported_protocols": [
"http11"
],
"signature_algorithms": [
1027,
1283,
1539,
2055,
2056,
2057,
2058,
2059,
2052,
2053,
2054,
1025,
1281,
1537,
771,
769,
770,
1026,
1282,
1538
],
"psk_key_exchange_mode": "AQ==",
"cert_compression_algorithms": "AA==",
"early_data": false,
"using_psk": false,
"selected_protocol": "http/1.1",
"selected_curve_group": 29,
"selected_cipher_suite": 4865,
"key_shares": [
29
]
}
}
As you can tell, the TLS fingerprint generated by Chrome and node-curl-impersonate
are much closer to each other than the one produced by fetch()
.
Most likely, the only difference between the TLS fingerprints of Chrome and node-curl-impersonate
is that they are based on different versions of the browser. This plays a key role in bot detection and explains why node-curl-impersonate
was able to retrieve the HTML content of the Kick home page while the Fetch API failed.
How curl-impersonate
works
To achieve the result highlighted earlier, the team behind curl-impersonate
had to patch curl
to resemble a browser as closely as possible. In particular, these are the changes they introduced:
- Compiling
curl
with BoringSSL, the TLS library used by Google Chrome, instead of OpenSSL. For the Firefox version,curl
was compiled with NSS, Firefox’s TLS library - Modifying the way
curl
configures several SSL options and TLS extensions - Adding support for new TLS extensions
- Adjusting the settings for
curl
's HTTP/2 connections - Running
curl
with non-default flags, such as--ciphers
,--curves
, and specific-H
headers (like the[User-Agent](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent)
), to further mimic the behavior of a browser
These modifications allow requests made by curl-impersonate
to be identical, from a network perspective to those of a real browser.
You can find all the implementation details in the guides on the official blog, which explain how they managed to fully impersonate Chrome and mimic Firefox.
The advantages of curl-impersonate
over browser automation tools
If you are an expert in Node.js web automation, you might assume that using headless browsers controlled by technologies like Playwright or Puppeteer is more effective than utilizing curl-impersonate
. No surprise, those two libraries are listed in our list of the best Node.js web scraping technologies. After all, browser automation tools also enable you to interact with the elements on the page. However, curl-impersonate
is just an HTTP client that can only retrieve web pages. Still, there are Node.js web automation scenarios where a library like node-curl-impersonate
might be a better choice than Playwright or Puppeteer. The reason for this is that anti-bot systems often use a two-step approach to detect and block bots. The first step checks if the request is coming from a legitimate browser, as explained earlier in this article. If the request seems suspicious, it is blocked. Otherwise, the server delivers the HTML document of the page. The page includes special JavaScript scripts that inspect the browser's settings and configurations to generate a browser fingerprint. This is then sent back to the anti-bot system to determine whether the user is legitimate. The second step works because automation tools tend to configure browsers in ways that differ from regular browsers. These differences are enough for anti-bot solutions to understand that they are dealing with an automated request. For more information, check out our guide on Playwright Extra. In contrast, curl-impersonate
cannot render JavaScript, skipping the second step entirely. If the second step is not required to be considered a legitimate user, node-curl-impersonate
can continue to effectively send requests to the target server without resource overheads and slowness typical of headless browsers — even in headles mode.
Conclusion
In this article, we explored what curl-impersonate
is, how to use it in Node.js, and why it can be more effective than browser automation tools in bypassing anti-bot systems. We learned that the key to its success lies in low-level network details, such as TLS fingerprinting. With this special build of curl
, you can take your automation scripts in Node.js to the next level! If you have any further questions about using curl-impersonate
in Node.js, feel free to comment below.
200’s only ✔️ Monitor failed and slow network requests in production
Deploying a Node-based web app or website is the easy part. Making sure your Node instance continues to serve resources to your app is where things get tougher. If you’re interested in ensuring requests to the backend or third party services are successful, try LogRocket.
LogRocket is like a DVR for web apps, recording literally everything that happens on your site. Instead of guessing why problems happen, you can aggregate and report on problematic network requests to quickly understand the root cause.
LogRocket instruments your app to record baseline performance timings such as page load time, time to first byte, slow network requests, and also logs Redux, NgRx, and Vuex actions/state. Start monitoring for free.
Posted on November 21, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 28, 2024
November 21, 2024