Creating a priority queue with a custom sort order using a binary heap in Rust
Tim McNamara
Posted on October 28, 2022
I want to create a personal web crawler. As I download each page, I collect a new list of URLs. To increase the likelihood that I am downloading the most important pages first, I want to keep the list of URLs to visit sorted by length (shorter URLs are more likely to be closer to the front page). That presents a problem though, because I'm also adding URLs.
A naïve list (Vec<T>
in Rust), won't be very efficient. After every batch of inserts, I would need to re-sort the list. Wouldn't it be good if there were a data structure that could maintain the sort order itself? Yes! It's called a binary heap.
Rust's standard library provides a binary heap implementation, as std::collections::BinaryHeap
. In my case, I want to sort a list of url::Url
. By default, they are sorted in alphabetic order.
use url::Url;
use std::collections::BinaryHeap;
fn main() {
let raw_urls = vec![
"https://www.youtube.com/watch?v=ywskA8CoulM",
"https://tim.mcnamara.nz/",
"https://github.com/rust-lang/rust/issues?labels=E-easy&state=open",
];
let mut urls = BinaryHeap::new();
for url in raw_urls {
urls.push(Url::parse(url).unwrap());
}
while let Some(url) = urls.pop() {
println!("{:?}", url.as_str());
}
}
This prints our URLs, but in alphabetic order:
"https://www.youtube.com/watch?v=ywskA8CoulM"
"https://tim.mcnamara.nz/"
"https://github.com/rust-lang/rust/issues?labels=E-easy&state=open"
Unfortunately, it's a little bit complicated to ask BinaryHeap
to use a custom sort order. To change the default, I create a new type that wraps one as ShortestFirst(Url)
. Then, I implement Ord
and PartialOrd
traits.
use url::Url;
use std::collections::BinaryHeap;
#[derive(PartialEq, Eq, , Debug)]
struct ShortestFirst(Url);
impl PartialOrd for ShortestFirst {
fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
Some(self.cmp(other))
}
}
impl Ord for ShortestFirst {
fn cmp(&self, other: &Self) -> std::cmp::Ordering {
let left = (other.0).as_str().len();
let right = (self.0).as_str().len();
left.cmp(&right)
}
}
fn main() {
let raw_urls = vec![
"https://www.youtube.com/watch?v=ywskA8CoulM",
"https://tim.mcnamara.nz/",
"https://github.com/rust-lang/rust/issues?labels=E-easy&state=open",
];
let mut urls = BinaryHeap::new();
for url in raw_urls {
urls.push(ShortestFirst(Url::parse(url).unwrap()));
}
while let Some(ShortestFirst(url)) = urls.pop() {
println!("{:?}", url.as_str());
}
}
This prints the URLs, in the order we wanted:
"https://tim.mcnamara.nz/"
"https://www.youtube.com/watch?v=ywskA8CoulM"
"https://github.com/rust-lang/rust/issues?labels=E-easy&state=open"
Posted on October 28, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
October 16, 2024
October 28, 2022