Web scraping with Rust

mariuszmalek

Mariusz Malek

Posted on December 5, 2022

Web scraping with Rust

Web scraping is the process of extracting data from websites and storing it for later use. In this tutorial, we will learn how to perform web scraping in Rust, a statically typed, multi-paradigm programming language that was designed to be safe, concurrent, and fast.

To perform web scraping in Rust, we will need a few tools:

  1. The reqwest library: This library provides a convenient and easy-to-use API for making HTTP requests and handling responses.

  2. The select.rs library: This library allows us to easily extract data from HTML documents using CSS selectors.

First, let's create a new Rust project and add the reqwest and select.rs libraries as dependencies in our Cargo.toml file:

[dependencies]
reqwest = "0.10.4"
select = "0.4.4"
Enter fullscreen mode Exit fullscreen mode

Next, let's create a new file src/main.rs and add the following code:

use std::io;

use reqwest::Client;
use select::document::Document;
use select::predicate::{Attr, Name};

fn main() -> io::Result<()> {
    let mut resp = Client::new()
        .get("https://www.rust-lang.org")
        .send()?;

    let body = resp.text()?;
    let document = Document::from(body.as_str());

    for node in document.find(Attr("id", "blog-entries")) {
        for entry in node.find(Name("a")) {
            let title = entry.text();
            let url = entry.attr("href").unwrap();
            println!("{} ({})", title, url);
        }
    }

    Ok(())
}
Enter fullscreen mode Exit fullscreen mode

In this code, we are using the reqwest library to make an HTTP GET request to the Rust website, and then we are using the select.rs library to extract data from the HTML response. We are using a CSS selector to find the div element with the id attribute "blog-entries", and then we are finding all a elements within that div. For each a element, we are printing the text (the title of the blog post) and the href attribute (the URL of the blog post).

Now, let's run our web scraping program:

$ cargo run
   Compiling webscraper v0.1.0 (/home/user/webscraper)
    Finished dev [unoptimized + debuginfo] target(s) in 1.17s
     Running `target/debug/webscraper`
Introducing the Rust 1.52 release channel (https://blog.rust-lang.org/2022/03/03/Rust-1.52.html)
How does the Rust release process work? (https://blog.rust-lang.org/inside-rust/inside-rust-february-2022.html#how-does-the-rust-release-process-work)
ā€¦
Enter fullscreen mode Exit fullscreen mode

As you can see, our web scraping program has successfully extracted the title and URL of each blog post from the Rust website.

In conclusion, web scraping in Rust is relatively simple and straightforward using the reqwest and select.rs libraries.

šŸ’– šŸ’Ŗ šŸ™… šŸš©
mariuszmalek
Mariusz Malek

Posted on December 5, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Web scraping with Rust
sraping Web scraping with Rust

December 5, 2022