Examining HN Discovery Quality Using Existing Complaints
Daniel Griffin
Posted on September 10, 2024
We've launched a new Hacker News search experience, focused on discovery: hn.trieve.ai (GitHub: Trieve API backend, frontend search interface).
Hacker News has long been a playground for search innovation—with the community often leaning in to explore new possibilities in search. Over the past six months, Nick has been looking back at the various search experiences and detailed his findings in a post: History of HackerNews Search: From 2007 to 2024.
We combed through HN (and user issues posted to Algolia's repo for HN search) in search of search complaints. Over the years there have been some complaints about indexing issues, and we’re not covering those in this post. Instead, we looked for examples where people shared actual search queries. For each query, we looked for what they said or implied about their search intent and the search results they found. What have people said about search quality? What searches are not possible or not easy in the current HN search? When are folks resorting to running a site:
search on a search engine like Google? Where can Trieve help make search better?
Discover well-beyond exact matches in titles
Searching for "postgres clustering"
- Search intent: Finding information about PostgreSQL clustering solutions
- Sourced from: @avereveard comment on Hacker News
Algolia for "postgres clustering"
Trieve for "postgres clustering"
Searching for "AT&T says criminals stole phone records of 'nearly all' customers in new data breach"
- Search intent: Exact phrase match with long query
- Sourced from: @Pranoy1c on github/algolia/hn-search/issues
Algolia for "AT&T says criminals stole phone records of nearly all customers in new data breach"
Trieve for "AT&T says criminals stole phone records of nearly all customers in new data breach"
Searching with special characters
Searching for "[video]"
- Search intent: Exact string match when string contains non-alphanumeric characters
- Sourced from: @some1else on github/algolia/hn-search/issues
Algolia for "[video]"
Trieve (semantic) for "[video]"
Algolia (quoted) for "[video]"
Trieve (semantic, quoted) for "[video]"
Searching for "AT&T"
- Search intent: Exact match for acronym (with ampersand)
- Sourced from: @Pranoy1c on github/algolia/hn-search/issues
Algolia (prefix=true) for "AT&T"
Algolia (prefix=false) for "AT&T"
Algolia (quoted) for "AT&T"
Trieve for "AT&T"
Out-of-domain strings
Searching for "lootitooti"
- Search intent: Match for out-of-domain string
- Sourced from: @douglaskayama comment on Hacker News
Algolia for "lootitooti"
Trieve for "lootitooti"
Presque vue searches
Searching for "deterministic Docker builds"
- Search intent: Trying to remember "the name of an SaaS to pin/cache/ back up my apt/apk/pip dependencies"
- Sourced from: @codethief comment on Hacker News
Algolia (type: Story) for "deterministic Docker builds"
Trieve (type: Story) for "deterministic Docker builds"
Searching for "tip of your tongue phenomenon"
Bonus! Again, precision focused approach of requiring "your" has downsides.
Algolia for "tip of your tongue phenomenon"
Trieve for "tip of your tongue phenomenon"
Filter on author with a hyphenated username
Searching for ""It Won't Fail Because of Me" by:1970-01-01"
- Search intent: Find posts by author with a hyphenated username
- Sourced from: @SushiHippie Ask HN on Hacker News
Algolia for "It Won't Fail Because of Me" by:1970-01-01
Trieve for "It Won't Fail Because of Me"
Default sorting by relevance v. popularity metrics
Searching for "Excel"
- Search intent: Find popular Excel-related stories on Hacker News
- Sourced from: @airstrike on Hacker News
This is from a comparison that @airstrike shared after we launched our discovery search. He preferred the results from Algolia. Algolia defaults to a popularity-based sort. Algolia also has sort-by-date, but does not have a specific relevance-focused sorting option.
Trieve offers multiple sorting options:
- default: relevance (not tuned to extrinsic popularity metrics)
- number of points (similar to Algolia's "popularity")
- date (reverse chronological)
- descendants (number of comments)
Nick (@skeptrune) responded with some of our internal deliberations:
We went back and forth on making points sorting default and ended up deciding against it, but maybe we should have. Our thinking was that since it's focused on "discovery" it was worth prioritizing relevance, but I can see how it can feel the result quality isn't as great.
If someone is looking for more of the popularity-focused results, they can start their Trieve HN Discovery searches with the sortby=
parameter set to num_value
(try this link).
Algolia (sorted by popularity) for "Excel"
Trieve (sorted by relevance) for "Excel"
Trieve (sorted by points) for "Excel"
Trieve (sorted by descendants) for "Excel"
If you want to explore comparisons between the Algolia HN Search and our Trieve HN Discovery, it can help to use our "Try it with Trieve!" button via our open-source unpacked Chrome extension: github.com/devflowinc/try-it-with-trieve
A "Try it with Trieve!" button in action
Learn something from this post? If you'd like to support our project, we'd be grateful if you'd explore and star our GitHub repository!
Posted on September 10, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.