Creating a priority queue with a custom sort order using a binary heap in Rust

timclicks

Tim McNamara

Posted on October 28, 2022

Creating a priority queue with a custom sort order using a binary heap in Rust

I want to create a personal web crawler. As I download each page, I collect a new list of URLs. To increase the likelihood that I am downloading the most important pages first, I want to keep the list of URLs to visit sorted by length (shorter URLs are more likely to be closer to the front page). That presents a problem though, because I'm also adding URLs.

A naïve list (Vec<T> in Rust), won't be very efficient. After every batch of inserts, I would need to re-sort the list. Wouldn't it be good if there were a data structure that could maintain the sort order itself? Yes! It's called a binary heap.

Rust's standard library provides a binary heap implementation, as std::collections::BinaryHeap. In my case, I want to sort a list of url::Url. By default, they are sorted in alphabetic order.

use url::Url;
use std::collections::BinaryHeap;

fn main() {
    let raw_urls = vec![
        "https://www.youtube.com/watch?v=ywskA8CoulM",
        "https://tim.mcnamara.nz/",
        "https://github.com/rust-lang/rust/issues?labels=E-easy&state=open",
    ];

    let mut urls = BinaryHeap::new();
    for url in raw_urls {
        urls.push(Url::parse(url).unwrap());
    }

    while let Some(url) = urls.pop() {
        println!("{:?}", url.as_str());
    }
}
Enter fullscreen mode Exit fullscreen mode

This prints our URLs, but in alphabetic order:

"https://www.youtube.com/watch?v=ywskA8CoulM"
"https://tim.mcnamara.nz/"
"https://github.com/rust-lang/rust/issues?labels=E-easy&state=open"
Enter fullscreen mode Exit fullscreen mode

Unfortunately, it's a little bit complicated to ask BinaryHeap to use a custom sort order. To change the default, I create a new type that wraps one as ShortestFirst(Url). Then, I implement Ord and PartialOrd traits.

use url::Url;
use std::collections::BinaryHeap;

#[derive(PartialEq, Eq, , Debug)]
struct ShortestFirst(Url);

impl PartialOrd for ShortestFirst {
    fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
        Some(self.cmp(other))
    }
}

impl Ord for ShortestFirst {
    fn cmp(&self, other: &Self) -> std::cmp::Ordering {
        let left = (other.0).as_str().len();
        let right = (self.0).as_str().len();

        left.cmp(&right)
    }
}

fn main() {
    let raw_urls = vec![
        "https://www.youtube.com/watch?v=ywskA8CoulM",
        "https://tim.mcnamara.nz/",
        "https://github.com/rust-lang/rust/issues?labels=E-easy&state=open",
    ];

    let mut urls = BinaryHeap::new();
    for url in raw_urls {
        urls.push(ShortestFirst(Url::parse(url).unwrap()));
    }

    while let Some(ShortestFirst(url)) = urls.pop() {
        println!("{:?}", url.as_str());
    }
}
Enter fullscreen mode Exit fullscreen mode

This prints the URLs, in the order we wanted:

"https://tim.mcnamara.nz/"
"https://www.youtube.com/watch?v=ywskA8CoulM"
"https://github.com/rust-lang/rust/issues?labels=E-easy&state=open"
Enter fullscreen mode Exit fullscreen mode
💖 💪 🙅 🚩
timclicks
Tim McNamara

Posted on October 28, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related