No Code Scraping: Using No Code tools to Scrape an eCommerce site and send an alert text using Zyte API, n8n, and Telegram
Pratik Parmar
Posted on February 21, 2023
As an audiophile, I am always on the lookout for deals on headphones, speakers and quality AV. Headphone Zone is my go-to place for such purchases, as they regularly host clearance sales and I love a deal too.🤑
However, I often miss out on these sales because the emails announcing them get lost in the flood of promotional emails and spam I receive, and I don’t want to visit these sites every day. So I’d much rather have an alert sent to my telegram account when there is a sale.
So, being a geek I am I decided to see if I could use some simple no-code tools (because why not) and a web scraping API (Zyte API) to create a system that sends me a notification whenever there is a clearance sale on Headphone Zone.
This way, I can be sure not to miss out on any great deals on audio devices.
The good news is it was surprisingly easy, so I’m going to show you do it too. Sounds fun, right? Let's get started. 🚀
How it's going to work:
What you’re going to do is:
- Setup a no-code tool (n8n) to manage the various tools and tasks needed
- Setup a web scraping API and scrape the site to monitor the site every day
- Detect if there is a sale on
- Send an alert to my phone via telegram
1. Introduction to n8n and Zyte API
n8n is a no-code workflow automation tool. It allows you to automate data-driven processes and connect your apps into a single workflow. n8n provides a drag-and-drop user interface that enables you to build workflows without writing any code.
Zyte API is an API that aims to solve all web data extraction needs. It comes with a built-in, transparent anti-ban solution with IP rotation, browser emulation, and website-specific fine-tuning. In a nutshell, Zyte API will ensure you don't get ban-hammered and delivers your data without any hiccup.
2. Prerequisites: Setting up n8n and acquire Zyte API Key
2.1 n8n:
Easiest way to get started with n8n is the desktop app.Check out the quickstart guide for more information.
2.2 Zyte API Key:
For Zyte API, you just need to sign up at https://app.zyte.com/account/signup/zyteapi and fetch the Zyte API key. For step by step guide, you can refer to this guide. Keep this API key handy, we will be using this API later in the workflow.
2.3 Telegram Bot Credentials (optional):
On Telegram, chat with BotFather and create a new bot. BotFather will provide a bot token, which can be used to integrate the bot into any platform.
One last thing we’ll need is a Telegram Chat ID.
Check out this guide to learn how to get a chat ID.
You can skip this step, if you don’t want to use the Telegram node.
3. How web scraping workflow works in n8n
- Pass the website URL you want to scrape in the cURL. Here it’s
https://www.headphonezone.in/collections/clearance
. - Using the HTTP request node, make a Zyte API call to fetch the HTML content
- Use the HTML Extract node to extract data from the HTML content
- Clean the data and send it over using the Telegram node.
4. Configuring the workflow
To create a new n8n workflow, just head over to the workflow
and click on new
to create a new workflow.
The n8n workflow is made up of small executable blocks known as nodes.
You can install any nodes by clicking on the "+" button in the top right corner of the n8n dashboard and selecting from the list of available nodes.
4.1 Getting the website data
In order to download the HTML data from the website we will need an HTTP Request node. After adding it to your workflow, you will need to specify the website URL you want to scrape. You can also specify any other request parameters, such as the method, headers, and body, as needed.
Alternatively, you can use cURL to configure this node. Click on Import cURL
. Paste the following cURL request and import it. Make sure that you’ve updated your Zyte API key here.
curl \
--user YOUR_ZYTE_API_KEY_HERE: \
--header 'Content-Type: application/json' \
--data '{"url": "https://www.headphonezone.in/collections/clearance", "browserHtml": true}' \
https://api.zyte.com/v1/extract
In the node configuration, go to options and add a Response
option. Set the response format as Text
and set Put Output in Field
as response
.
Click on execute to ensure everything is working properly.
4.2 Fetch HTML Content using a Set Node
Now if you noticed, the response from the HTTP Request node is in the JSON. For our workflow, we only need the browserHtml though, which is the HTML content of the webpage. We can use the Set node to create another field ‘data’ and assign browserHtml
to that field.
- Keep Only Set: Enable
- Values to Set:
- String
- Name:
data
- Value:
{{$json["response"]["browserHtml"]}}
This node will fetch the browserHtml field from the response field which is a JSON field, and set it to the data field of string data type.
4.3 Extracting Product Data from the HTML Data
In the HTML Extract node, you can use CSS selectors to extract the data you want from the HTML response received by the HTTP Request node. You can specify the element or attribute you want to extract by entering a CSS selector in the Selector
field.
- Node: HTML
- Source Data: JSON
- JSON Property:
data
We’ve stored the browserHtml
in the data
variable, in the previous Set
node.
Extraction Values
- Key:
products
- CSS Selectors:
.product-item
.product-item
is the main element under which all product details are stored.
- Return Value: HTML
- Return Array: Enable
Store the response as HTML in the form of an array.
Post execution, the output should look like this.
Now we can see, we’ve received the HTML data of all products. But what we want is human-readable information, that too individually.
So let’s separate it out first using the Item Lists node.
- Node: Item List
- Operations: Split Out Items
- Field To Split Out: products
- Include: No Other Fields
4.4 Extracting Individual Products
Alrighty, now that we have got a products field, which contains data of all products in the HTML format. All we need to do is extract the product information using the HTML Extract Node.
- Node: HTML Extract Node
- Source Data: JSON
- JSON Property:
products
Extraction Values:
- Extract Product Name
- Key:
name
- CSS Selector:
.product-item-meta__title
Return Value: Text
Extract Product URL
Key:
url
CSS Selector:
.product-item-meta__title
Return Value: Attribute
Attribute:
href
Extract Product Price
Key:
price
CSS Selector:
.price--highlight
Return Value: Text
After executing the node, it should display the individual product details.
4.5 Send Message on Telegram
Phew, that was fun! But hey, there’s more! Time to add a telegram Node to send messages to the Telegram bot we created earlier.
First of all, add credentials for Telegram API and provide a bot token.
- Node: Telegram
- Resource: Message
- Operation: Send Message
- Chat ID: Provide the Telegram chat ID we received earlier.
- Text:
{{ $json["name"] }}
https://www.headphonezone.in/{{ $json["url"] }}
{{ $json["price"] }}
This text will display a message in this format on Telegram:
TIN HiFi - T5
https://www.headphonezone.in//products/tin-hifi-t5
Sale price₹ 9,999
And voila! If everything worked well, you should be able to receive messages on the Telegram bot.
4.6 Automate the workflow
Currently, we still need to execute the workflow manually.
What’s the point of having an automation workflow if it needs a manual trigger? Fortunately, n8n also has a cron node. Let’s add that node to the workflow and remove the start node.
Here I want to execute this workflow first day of every month at midnight, so this is how the configuration looks like:
- Node: Cron
Trigger Times:
- Mode: Every Month
- Hour: 0
- Minute: 0
- Day of the Month:1
4.6 Activate the workflow
Finally, we’re ready to deploy our workflow. Before that, make sure that you followed all steps of the tutorial, your workflow should look like this. You can activate the workflow from the top right corner of the n8n app and done! Your workflow is now active and will notify you when there’s any offer available.
5. What’s Next?
Our workflow is still hosted on our desktop / local machine. Hence, if your computer is off during the trigger time, then the workflow won’t work and you will miss out on the amazing deals.
You can check out the n8n cloud to deploy your workflow on the cloud.
Also, our workflow only scrapes data from the first page of the clearance section. It also needs pagination logic.
What’s the solution then?
You can try the Zyte Auto Extract API, which takes care of the scraping logic, pagination, and every painful aspect of the scraping.
Let us know in the comments, if you want us to create another tutorial on that.
Till then, happy scraping! This is me, Pratik Parmar signing off.
Over and out!
Posted on February 21, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.