Web scraping Google Shopping Product Specs with Nodejs
Mikhail Zub
Posted on November 28, 2022
What will be scraped
Full code
If you don't need an explanation, have a look at the full code example in the online IDE
const cheerio = require("cheerio");
const axios = require("axios");
const productId = "14938360545167499200"; // Parameter defines the ID of a product you want to get the results for
const AXIOS_OPTIONS = {
headers: {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36",
}, // adding the User-Agent header as one way to prevent the request from being blocked
params: {
hl: "en", // parameter defines the language to use for the Google search
gl: "us", // parameter defines the country to use for the Google search
},
};
function getProductSpecs() {
return axios.get(`https://www.google.com/shopping/product/${productId}/specs`, AXIOS_OPTIONS).then(function ({ data }) {
let $ = cheerio.load(data);
let category;
return {
productId,
title: $(".BvQan")?.text().trim(),
reviews: parseInt($(".HiT7Id > span")?.attr("aria-label")?.replace(",", "")),
rating: parseFloat($(".UzThIf")?.attr("aria-label")),
extensions: Array.from($(".OA4wid")).map((el) => $(el).text().replaceAll("·", "").trim()),
description: $(".bwcLrc")?.text().trim(),
specsResults: Array.from($(".O2pTHb tr")).reduce((results, el) => {
if (!$(el).hasClass("vm91i")) {
category = $(el).text().trim();
} else {
results[`${category}`] = {
...results[`${category}`],
[$(el).find(".ipBhab")?.text().trim()]: $(el).find(".AnDf0c")?.text().trim(),
};
}
return { ...results };
}, {}),
};
});
}
getProductSpecs().then((result) => console.dir(result, { depth: null }));
Preparation
First, we need to create a Node.js* project and add npm
packages cheerio
to parse parts of the HTML markup, and axios
to make a request to a website.
To do this, in the directory with our project, open the command line and enter:
$ npm init -y
And then:
$ npm i cheerio axios
*If you don't have Node.js installed, you can download it from nodejs.org and follow the installation documentation.
Process
First of all, we need to extract data from HTML elements. The process of getting the right CSS selectors is fairly easy via SelectorGadget Chrome extension which able us to grab CSS selectors by clicking on the desired element in the browser. However, it is not always working perfectly, especially when the website is heavily used by JavaScript.
We have a dedicated web Scraping with CSS Selectors blog post at SerpApi if you want to know a little bit more about them.
The Gif below illustrates the approach of selecting different parts of the results.
Code explanation
Declare constants from cheerio
and axios
libraries:
const cheerio = require("cheerio");
const axios = require("axios");
Next, we write product ID, the request options: HTTP headers
with User-Agent
which is used to act as a "real" user visit, and the necessary parameters for making a request:
const productId = "14938360545167499200"; // Parameter defines the ID of a product you want to get the results for
const AXIOS_OPTIONS = {
headers: {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36",
}, // adding the User-Agent header as one way to prevent the request from being blocked
params: {
hl: "en", // parameter defines the language to use for the Google search
gl: "us", // parameter defines the country to use for the Google search
},
};
📌Note: Default axios
request user-agent is axios/<axios_version>
so websites understand that it's a script that sends a request and might block it. Check what's your user-agent.
Next, we write a function that makes the request and returns the received data. We received the response from axios
request that has data
key that we destructured and parse it with cheerio
:
function getProductSpecs() {
return axios
.get(`https://www.google.com/shopping/product/${productId}/specs`, AXIOS_OPTIONS)
.then(function ({ data }) {
let $ = cheerio.load(data);
...
})
}
Next, we need to get the different parts of the page using next methods:
title: $(".BvQan")?.text().trim(),
reviews: parseInt($(".HiT7Id > span")?.attr("aria-label")?.replace(",", "")),
rating: parseFloat($(".UzThIf")?.attr("aria-label")),
extensions: Array.from($(".OA4wid")).map((el) => $(el).text().replaceAll("·", "").trim()),
description: $(".bwcLrc")?.text().trim(),
The screenshot below shows you what structure in the DOM has a specs table:
To get specs we need to write an empty variable to change the current category. Then we use reduce()
method (it allows to make the object with results) to iterate an array that is built with Array.from()
method.
In the reduce
function we check if the current element doesn't have class "vm91i"
(hasClass() method) we set it to category. The next elements are from this category, so we add them to the current category (using spread syntax
. And repeat it for the next category:
let category;
return {
...
specsResults: Array.from($(".O2pTHb tr")).reduce((results, el) => {
if (!$(el).hasClass("vm91i")) {
category = $(el).text().trim();
} else {
results[`${category}`] = {
...results[`${category}`],
[$(el).find(".ipBhab")?.text().trim()]: $(el).find(".AnDf0c")?.text().trim(),
};
}
return { ...results };
}, {}),
};
Now we can launch our parser:
$ node YOUR_FILE_NAME # YOUR_FILE_NAME is the name of your .js file
Output
{
"productId":"14938360545167499200",
"title":"Apple iPhone 12 Pro - 128 GB - Silver - Unlocked",
"reviews":5109,
"rating":4.5,
"extensions":[
"Smartphone",
"Dual SIM",
"iOS",
"5G",
"With Wireless Charging",
"Triple Lens",
"GSM",
"CDMA",
"With OLED Display",
"Facial Recognition"
],
"description":"Apple · iPhone · iPhone 12 · iPhone 12 Pro · iOS · 6.1′′ · Facial Recognition · 12 MP front camera · 12 MP rear camera · Smartphone · With Wireless Charging. Beautifully bright 6.1-inch Super Retina XDR display. Ceramic Shield with 4x better drop performance. Incredible low-light photography with a new Pro camera system, and 4x optical zoom range. Cinema-grade Dolby Vision video recording, editing, and playback. Night mode portraits and next-level AR experiences with the LiDAR Scanner. Powerful A14 Bionic chip. 5G capable. And new MagSafe accessories for easy attach and faster wireless charging. For infinitely spectacular possibilities. Legal. The display has rounded corners. When measured as a rectangle, the screen is 6.06 inches diagonally. Actual viewable area is less. Claim based on iPhone 12 Pro Ceramic Shield front compared with previous-generation iPhone. Data plan required. 5G is available in select markets and through select carriers. Speeds vary based on site conditions and carrier. Accessories are sold separately. Apple ProRAW coming soon. iPhone 12 Pro is splash, water, and dust resistant and was tested under controlled laboratory conditions with a rating of IP68 under IEC standard 60529 (maximum depth of 6 meters up to 30 minutes). Splash, water, and dust resistance are not permanent conditions. Resistance might decrease as a result of normal wear. Do not attempt to charge a wet iPhone; refer to the user guide for cleaning and drying instructions.",
"specsResults":{
"General":{
"Product Type":"Smartphone",
"Manufacturer Model Number":"A2341",
"Form Factor":"Touch",
...and oter specs
},
"Cellular":{
"Technology":"CDMA2000 1X / GSM / WCDMA (UMTS)",
"Mobile Broadband Generation":"5G",
"Service Provider":"Not specified",
...and oter specs
},
...and other categories
}
}
Using Google Product Specs Results API from SerpApi
This section is to show the comparison between the DIY solution and our solution.
The biggest difference is that you don't need to create the parser from scratch and maintain it.
There's also a chance that the request might be blocked at some point from Google, we handle it on our backend so there's no need to figure out how to do it yourself or figure out which CAPTCHA, proxy provider to use.
First, we need to install google-search-results-nodejs
:
npm i google-search-results-nodejs
Here's the full code example, if you don't need an explanation:
const SerpApi = require("google-search-results-nodejs");
const search = new SerpApi.GoogleSearch(process.env.API_KEY); //your API key from serpapi.com
const params = {
product_id: "14938360545167499200", // Parameter defines the ID of a product you want to get the results for.
engine: "google_product", // search engine
device: "desktop", //Parameter defines the device to use to get the results. It can be set to "desktop" (default), "tablet", or "mobile"
hl: "en", // parameter defines the language to use for the Google search
gl: "us", // parameter defines the country to use for the Google search
specs: true, // parameter for fetching specs results
};
const getJson = () => {
return new Promise((resolve) => {
search.json(params, resolve);
});
};
const getResults = async () => {
const json = await getJson();
return { ...json.product_results, specsResults: json.specs_results };
};
getResults().then((result) => console.dir(result, { depth: null }));
Code explanation
First, we need to declare SerpApi
from google-search-results-nodejs
library and define new search
instance with your API key from SerpApi:
const SerpApi = require("google-search-results-nodejs");
const search = new SerpApi.GoogleSearch(API_KEY);
Next, we write the necessary parameters for making a request:
const params = {
product_id: "14938360545167499200", // Parameter defines the ID of a product you want to get the results for.
engine: "google_product", // search engine
device: "desktop", //Parameter defines the device to use to get the results. It can be set to "desktop" (default), "tablet", or "mobile"
hl: "en", // parameter defines the language to use for the Google search
gl: "us", // parameter defines the country to use for the Google search
specs: true, // parameter for fetching specs results
};
Next, we wrap the search method from the SerpApi library in a promise to further work with the search results:
const getJson = () => {
return new Promise((resolve) => {
search.json(params, resolve);
});
};
And finally, we declare the function getResult
that gets data from the page and return it:
const getResults = async () => {
...
};
In this function we get json
with results, and return object with data from received json
using spread syntax
:
const json = await getJson();
return { ...json.product_results, specsResults: json.specs_results };
After, we run the getResults
function and print all the received information in the console with the console.dir
method, which allows you to use an object with the necessary parameters to change default output options:
getResults().then((result) => console.dir(result, { depth: null }));
Output
{
"product_id":14938360545167500000,
"title":"Apple iPhone 12 Pro - 128 GB - Silver - Unlocked",
"reviews":5109,
"rating":4.5,
"extensions":[
"Smartphone",
"Dual SIM",
"iOS",
"5G",
"With Wireless Charging",
"Triple Lens",
"GSM",
"CDMA",
"With OLED Display",
"Facial Recognition"
],
"description":"Apple · iPhone · iPhone 12 · iPhone 12 Pro · iOS · 6.1′′ · Facial Recognition · 12 MP front camera · 12 MP rear camera · Smartphone · With Wireless Charging. Beautifully bright 6.1-inch Super Retina XDR display. Ceramic Shield with 4x better drop performance. Incredible low-light photography with a new Pro camera system, and 4x optical zoom range. Cinema-grade Dolby Vision video recording, editing, and playback. Night mode portraits and next-level AR experiences with the LiDAR Scanner. Powerful A14 Bionic chip. 5G capable. And new MagSafe accessories for easy attach and faster wireless charging. For infinitely spectacular possibilities. Legal. The display has rounded corners. When measured as a rectangle, the screen is 6.06 inches diagonally. Actual viewable area is less. Claim based on iPhone 12 Pro Ceramic Shield front compared with previous-generation iPhone. Data plan required. 5G is available in select markets and through select carriers. Speeds vary based on site conditions and carrier. Accessories are sold separately. Apple ProRAW coming soon. iPhone 12 Pro is splash, water, and dust resistant and was tested under controlled laboratory conditions with a rating of IP68 under IEC standard 60529 (maximum depth of 6 meters up to 30 minutes). Splash, water, and dust resistance are not permanent conditions. Resistance might decrease as a result of normal wear. Do not attempt to charge a wet iPhone; refer to the user guide for cleaning and drying instructions.",
"specsResults":{
"general":{
"product_type":"Smartphone",
"manufacturer_model_number":"A2341",
"form_factor":"Touch",
...and oter specs
},
"cellular":{
"technology":"CDMA2000 1X / GSM / WCDMA (UMTS)",
"mobile_broadband_generation":"5G",
"service_provider":"Not specified",
...and oter specs
},
...and other categories
}
}
Links
If you want to see some projects made with SerpApi, write me a message.
Add a Feature Request💫 or a Bug🐞
Posted on November 28, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.