Deno Web Scrapper

siddacool

Siddhesh Mangela

Posted on November 30, 2020

Deno Web Scrapper

You might have created a web scraper with Node.js + request+ cheerio setup or maybe a python one using beautiful soup. This tutorial brings the same to the world of Deno.

In this example, we are scrapping the list of books from

http://books.toscrape.com/

Let's get started, without further ado.

Step 01: app.ts

to start we will create app.ts file and cover the whole code in a try-catch block to take advantage of the first-class await (global async-await).

const url = 'http://books.toscrape.com/';

try {
  console.log(url)
} catch(error) {
  console.log(error);
}
Enter fullscreen mode Exit fullscreen mode

check if code logs the url by running the following command in terminal

deno run app.ts
Enter fullscreen mode Exit fullscreen mode

Step 02: Fetch URL

Deno supports lots of native javascript APIs, Fetch API being one of them which makes request handling easy and dependency-free. Response from fetch is saved in a variable named html.

const url = 'http://books.toscrape.com/';

try {
  const res = await fetch(url);
  const html = await res.text();

  console.log(html)
} catch(error) {
  console.log(error);
}
Enter fullscreen mode Exit fullscreen mode

Deno is secure by default that means to let it access the internet we need to run it with a flag --allow-net

check if code logs the html by running the following command in terminal.

deno run --allow-net app.ts
Enter fullscreen mode Exit fullscreen mode

Step 03: Deno Dom

Deno dom makes it easy to traverse HTML using javascript DOM manipulation methods.

HTML (in text format) that we get with fetch is parsed into a DOMParser object and stored in variable dom. dom variable is traversed to extract page heading from the target site.

import { DOMParser } from 'https://deno.land/x/deno_dom/deno-dom-wasm.ts';

const url = 'http://books.toscrape.com/';

try {
  const res = await fetch(url);
  const html = await res.text();
  const doc: any = new DOMParser().parseFromString(html, 'text/html');

  const pageHeader = doc.querySelector('.header').querySelector('.h1').textContent;

  console.log(pageHeader)
} catch(error) {
  console.log(error);
}
Enter fullscreen mode Exit fullscreen mode

check if code logs “Books to Scrape We love being scraped!” by running the following command in the terminal.

deno run --allow-net app.ts
Enter fullscreen mode Exit fullscreen mode

Bringing it all together

The script picks up the book info by looping over each .product_pod container on the first page and puts it in the books array.

import { DOMParser } from 'https://deno.land/x/deno_dom/deno-dom-wasm.ts';

const url = 'http://books.toscrape.com/';

try {
  const res = await fetch(url);
  const html = await res.text();
  const doc: any = new DOMParser().parseFromString(html, 'text/html');
  const books: any = [];

  const productsPods = doc.querySelectorAll('.product_pod');

  productsPods.forEach((product: any) => {
    const title = product.querySelector('h3').querySelector('a').getAttribute('title');
    const price = product.querySelector('.price_color').textContent;
    const availability = product.querySelector('.availability').textContent.trim();

    books.push({
      title,
      price,
      availability,
    })
  });

  console.log(books);
} catch(error) {
  console.log(error);
}

Enter fullscreen mode Exit fullscreen mode
deno run --allow-net app.ts
Enter fullscreen mode Exit fullscreen mode

will output an array of books with title, price, and availability.


GitHub logo siddacool / deno-web-scraper

An example of a web scraper created with deno

Deno Web Scraper 🦴🕸

An example of a web scraper created with deno






💖 💪 🙅 🚩
siddacool
Siddhesh Mangela

Posted on November 30, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Deno Web Scrapper
deno Deno Web Scrapper

November 30, 2020