PostgreSQL Full Text Search Rank by Position
kvetoslavnovak
Posted on November 30, 2024
Recently, I encountered a problem with full-text search. I am using this feature in my search input, where the backend sends hints of possible matches as you type. The backend database is PostgreSQL. I needed the hints to be ranked by the position of the searched term in the text.
Full Text Search in PostgreSQL
Full-text search in PostgreSQL can be achieved quite easily. There are two main tools to use:
- tsvector - represents a searchable document.
- tsquery - represents the search query to perform against a document.
Let’s say we want to search the titles of our blog posts. To make them searchable, we can use the following query:
SELECT *
FROM blogposts
WHERE to_tsquery('JavaScript') @@ to_tsvector(posts.title);
In this case, we are converting the post titles to a tsvector dynamically with every search. However, this transformation takes some time. A better approach is to perform this transformation in advance in the database and also store it as an indexes for the titles for faster search.
Lets create a new collumn of vectors of titles and also index this new column:
ALTER TABLE blogposts ADD COLUMN search_vector tsvector;
UPDATE blogposts SET search_vector = (to_tsvector(posts.title));
CREATE INDEX titles_fts_idx ON blogposts USING gin(search_vector);
Now try the search the term "JavaScript"
SELECT *
FROM blogposts
WHERE to_tsquery('JavaScript') @@ search_vector;
You may also make indexes from ts vectors directly on titles column like this:
CREATE INDEX titles_fts_idx ON blogposts USING GIN (to_tsvector(posts.title));
and use the search like this:
SELECT *
FROM blogposts
WHERE to_tsquery('JavaScript') @@ posts.title;
Now, the full-text search will be blazingly fast, completing in milliseconds.
Ranking the Results
PostgreSQL provides the ts_rank feature, which allows you to score search results and order them based on their ranking. PostgreSQL supports the following ranking options:
- 0 (the default) ignores the document length
- 1 divides the rank by 1 + the logarithm of the document length
- 2 divides the rank by the document length
- 4 divides the rank by the mean harmonic distance between extents (this is implemented only by ts_rank_cd)
- 8 divides the rank by the number of unique words in document
- 16 divides the rank by 1 + the logarithm of the number of unique words in document
- 32 divides the rank by itself + 1
You can use the ts_rank like this:
SELECT
...
ts_rank(search_vector, query, 0) as rank_title
...
ORDER BY rank_title DESC NULLS LAST
However, there is no built-in ranking option based on the position of the search term within the string (i.e. title column).
POSITION to the rescue
Fortunatelly there is POSITION function in PostgreSQL. The PostgreSQL POSITION function is used to find the location of a substring within a given string. In our case we can use it like this
...
ts_rank(search_vector, query, 2) + (1.0 / NULLIF(POSITION(query IN posts.title), 0) + 0.0001) as rank
...
ORDER BY rank_title DESC NULLS LAST
ts_rank uses normalization integer 2 because 2 divides the rank by the document length
Magick number + 0.0001 is to avoid dividing by the 0 because POSTION function counts from 1 not 0 and returns 0 if the string is not found.
Final query would look like this:
SELECT *
FROM blogposts
WHERE to_tsquery('JavaScript') @@ search_vector
ts_rank(search_vector, query, 2) + (1.0 / NULLIF(POSITION(query IN posts.title), 0) + 0.0001) as rank
ORDER BY rank_title DESC NULLS LAST;
Searching for more terms
One caveat has to be mentioned if you search for more terms at once (like JavaScript and TypeScript).
The arguments for the to_tsquery function may be used with a big fleibility, including logical operators etc. POSITION function on the other hand is "just" a substring in string.
Here are the links to the documentatio in matter:
- https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-PARSING-DOCUMENTS
- https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-RANKING
- https://www.postgresql.org/docs/9.1/functions-string.html
Posted on November 30, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.