No Code Scraping: Using No Code tools to Scrape an eCommerce site and send an alert text using Zyte API, n8n, and Telegram

hackyroot

Pratik Parmar

Posted on February 21, 2023

No Code Scraping: Using No Code tools to Scrape an eCommerce site and send an alert text using Zyte API, n8n, and Telegram

As an audiophile, I am always on the lookout for deals on headphones, speakers and quality AV. Headphone Zone is my go-to place for such purchases, as they regularly host clearance sales and I love a deal too.🤑
However, I often miss out on these sales because the emails announcing them get lost in the flood of promotional emails and spam I receive, and I don’t want to visit these sites every day. So I’d much rather have an alert sent to my telegram account when there is a sale.

So, being a geek I am I decided to see if I could use some simple no-code tools (because why not) and a web scraping API (Zyte API) to create a system that sends me a notification whenever there is a clearance sale on Headphone Zone.

This way, I can be sure not to miss out on any great deals on audio devices.
The good news is it was surprisingly easy, so I’m going to show you do it too. Sounds fun, right? Let's get started. 🚀

How it's going to work:

What you’re going to do is:

  • Setup a no-code tool (n8n) to manage the various tools and tasks needed
  • Setup a web scraping API and scrape the site to monitor the site every day
  • Detect if there is a sale on
  • Send an alert to my phone via telegram

1. Introduction to n8n and Zyte API

n8n is a no-code workflow automation tool. It allows you to automate data-driven processes and connect your apps into a single workflow. n8n provides a drag-and-drop user interface that enables you to build workflows without writing any code.

Zyte API is an API that aims to solve all web data extraction needs. It comes with a built-in, transparent anti-ban solution with IP rotation, browser emulation, and website-specific fine-tuning. In a nutshell, Zyte API will ensure you don't get ban-hammered and delivers your data without any hiccup.

2. Prerequisites: Setting up n8n and acquire Zyte API Key

2.1 n8n:

Easiest way to get started with n8n is the desktop app.Check out the quickstart guide for more information.

n8n Workflow

2.2 Zyte API Key:

For Zyte API, you just need to sign up at https://app.zyte.com/account/signup/zyteapi and fetch the Zyte API key. For step by step guide, you can refer to this guide. Keep this API key handy, we will be using this API later in the workflow.

Acquire Zyte API Key

2.3 Telegram Bot Credentials (optional):

On Telegram, chat with BotFather and create a new bot. BotFather will provide a bot token, which can be used to integrate the bot into any platform.

One last thing we’ll need is a Telegram Chat ID.
Check out this guide to learn how to get a chat ID.

You can skip this step, if you don’t want to use the Telegram node.

3. How web scraping workflow works in n8n

  • Pass the website URL you want to scrape in the cURL. Here it’s https://www.headphonezone.in/collections/clearance.
  • Using the HTTP request node, make a Zyte API call to fetch the HTML content
  • Use the HTML Extract node to extract data from the HTML content
  • Clean the data and send it over using the Telegram node.

4. Configuring the workflow

To create a new n8n workflow, just head over to the workflow and click on new to create a new workflow.

Create new n8n workflow and add a new node

The n8n workflow is made up of small executable blocks known as nodes.

You can install any nodes by clicking on the "+" button in the top right corner of the n8n dashboard and selecting from the list of available nodes.

4.1 Getting the website data

In order to download the HTML data from the website we will need an HTTP Request node. After adding it to your workflow, you will need to specify the website URL you want to scrape. You can also specify any other request parameters, such as the method, headers, and body, as needed.

Alternatively, you can use cURL to configure this node. Click on Import cURL. Paste the following cURL request and import it. Make sure that you’ve updated your Zyte API key here.

curl \
   --user YOUR_ZYTE_API_KEY_HERE: \
   --header 'Content-Type: application/json' \
   --data '{"url": "https://www.headphonezone.in/collections/clearance", "browserHtml": true}' \
   https://api.zyte.com/v1/extract
Enter fullscreen mode Exit fullscreen mode

In the node configuration, go to options and add a Response option. Set the response format as Text and set Put Output in Field as response.

Click on execute to ensure everything is working properly.

HTTP Request Node configuration

4.2 Fetch HTML Content using a Set Node

Now if you noticed, the response from the HTTP Request node is in the JSON. For our workflow, we only need the browserHtml though, which is the HTML content of the webpage. We can use the Set node to create another field ‘data’ and assign browserHtml to that field.

  • Keep Only Set: Enable
  • Values to Set:
  • String
  • Name: data
  • Value: {{$json["response"]["browserHtml"]}}

This node will fetch the browserHtml field from the response field which is a JSON field, and set it to the data field of string data type.

Configure Set Node

4.3 Extracting Product Data from the HTML Data

In the HTML Extract node, you can use CSS selectors to extract the data you want from the HTML response received by the HTTP Request node. You can specify the element or attribute you want to extract by entering a CSS selector in the Selector field.

  • Node: HTML
  • Source Data: JSON
  • JSON Property: data

We’ve stored the browserHtml in the data variable, in the previous Set node.

Extraction Values

  • Key: products
  • CSS Selectors: .product-item

.product-item is the main element under which all product details are stored.

  • Return Value: HTML
  • Return Array: Enable

Store the response as HTML in the form of an array.
Post execution, the output should look like this.

Extract product list using

Now we can see, we’ve received the HTML data of all products. But what we want is human-readable information, that too individually.

So let’s separate it out first using the Item Lists node.

  • Node: Item List
  • Operations: Split Out Items
  • Field To Split Out: products
  • Include: No Other Fields

Image description

4.4 Extracting Individual Products

Alrighty, now that we have got a products field, which contains data of all products in the HTML format. All we need to do is extract the product information using the HTML Extract Node.

Extraction Values:

  • Extract Product Name
  • Key: name
  • CSS Selector: .product-item-meta__title
  • Return Value: Text

  • Extract Product URL

  • Key: url

  • CSS Selector: .product-item-meta__title

  • Return Value: Attribute

  • Attribute: href

  • Extract Product Price

  • Key: price

  • CSS Selector: .price--highlight

  • Return Value: Text

Image description

After executing the node, it should display the individual product details.

4.5 Send Message on Telegram

Phew, that was fun! But hey, there’s more! Time to add a telegram Node to send messages to the Telegram bot we created earlier.

First of all, add credentials for Telegram API and provide a bot token.

  • Node: Telegram
  • Resource: Message
  • Operation: Send Message
  • Chat ID: Provide the Telegram chat ID we received earlier.
  • Text:
{{ $json["name"] }}
https://www.headphonezone.in/{{ $json["url"] }}
{{ $json["price"] }}
Enter fullscreen mode Exit fullscreen mode

This text will display a message in this format on Telegram:

TIN HiFi - T5 
https://www.headphonezone.in//products/tin-hifi-t5 
Sale price₹ 9,999
Enter fullscreen mode Exit fullscreen mode

Image description

And voila! If everything worked well, you should be able to receive messages on the Telegram bot.

Image description

4.6 Automate the workflow

Currently, we still need to execute the workflow manually.

What’s the point of having an automation workflow if it needs a manual trigger? Fortunately, n8n also has a cron node. Let’s add that node to the workflow and remove the start node.

Here I want to execute this workflow first day of every month at midnight, so this is how the configuration looks like:

Trigger Times:

  • Mode: Every Month
  • Hour: 0
  • Minute: 0
  • Day of the Month:1

Image description

4.6 Activate the workflow

Finally, we’re ready to deploy our workflow. Before that, make sure that you followed all steps of the tutorial, your workflow should look like this. You can activate the workflow from the top right corner of the n8n app and done! Your workflow is now active and will notify you when there’s any offer available.

Image description

5. What’s Next?

Our workflow is still hosted on our desktop / local machine. Hence, if your computer is off during the trigger time, then the workflow won’t work and you will miss out on the amazing deals.

You can check out the n8n cloud to deploy your workflow on the cloud.

Also, our workflow only scrapes data from the first page of the clearance section. It also needs pagination logic.

What’s the solution then?

You can try the Zyte Auto Extract API, which takes care of the scraping logic, pagination, and every painful aspect of the scraping.

Let us know in the comments, if you want us to create another tutorial on that.

Till then, happy scraping! This is me, Pratik Parmar signing off.
Over and out!

💖 💪 🙅 🚩
hackyroot
Pratik Parmar

Posted on February 21, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related