Rob Hoelz
Posted on February 25, 2020
Originally published at https://hoelz.ro/blog/using-sqlites-fsdir-to-effect-a-quick-search-index
At work, we have a wiki we use for knowledge sharing. It's implemented on top of gitit, which works pretty well for our purposes, but sometimes I find the search functionality lacking. What I'd like is something that has more advanced full text search-like features - things like stemming (so that searching for "databases" finds documents that mention "database") and ranking by query relevance (so that documents that mention "database" many times are ranked higher than those that just mention "database" once).
Well, SQLite happens to have two pieces of functionality that can help in this regard: the FTS extension, which implements those features I just mentioned, and the fsdir table-valued function, which we can use to treat the filesystem like a table. Since gitit stores its pages in a Git repository, all we need to do is clone the repository and run a few SQL snippets to build our search index:
$ git clone work:wiki
$ sqlite3 wiki-search.db "create virtual table page_search using fts5 (path, page, tokenize='porter unicode61')"
$ sqlite3 wiki-search.db "insert into page_search select name, data from fsdir('wiki')"
Now we can search the wiki and benefit from those features: select path, snippet(page_search, 1, '', '', '...', 30) from page_search where page_search match 'my query' order by rank
Granted, it would be nice to do things like extract the front matter for each page, along with things like edit dates, but this works well enough.
Posted on February 25, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.