Learn PostgreSQL! - How to use String matching from multiple columns?
Saif Ali
Posted on August 6, 2023
Introduction
In the vast landscape of data management, PostgreSQL stands as a powerful and versatile open-source relational database system. As data enthusiasts, we are often faced with complex challenges in extracting valuable information from our databases efficiently. One such common task is searching for a specific word that may occur as a standalone word or as part of a larger word across multiple columns. In this comprehensive guide, we will explore the most scalable and effective solution to achieve this in PostgreSQL without resorting to separate LIKE statements for each column.
Similarly, I was working on an extensive database with tons of data. I faced a similar problem where I had to look up for occurrence of "yes" in not one but six columns at once. I needed to design a query that will return rows if any of the columns contained the word "yes".
Naive Approach
The most basic approach that I thought about first was to use separate LIKE
statement for all the columns in my WHERE
clause.
Although applicable and suitable for small sets of data, this approach can be tiresome and exhausting on larger tables both for the computer and developer. So what can be the smarter approach?
A Better Approach
Use concatenations and functional regex matching! I knew about concatenation but the shorter syntax for Regex matching I discovered in PostgreSQL docs was entirely new to me.
How did this work? Let me show an example with code!
SELECT *
FROM my_table
WHERE CONCAT_WS(' ', col1, col2, col3, col4, col5, col6) ~ '\myes\b';
In this query:
CONCAT_WS(' ', col1, col2, col3, col4, col5, col6)
combines all six columns into a single string, using a space as the separator between the values.
The ~
operator is used to apply a regular expression pattern on the concatenated string.
'\myes\b'
is the regular expression pattern. The \m
and \b
are word boundary anchors that ensure that "yes" is matched as a whole word and not as a part of a larger word. So, it will match "yes" as a standalone word and not as part of another word like "yesterday".
This approach is scalable as it allows you to easily search for the word "yes" in any number of columns without having to write separate LIKE statements for each column. It effectively treats all six columns as one combined column for the purpose of the search.
You can look up the more refined and shorthand syntax for Regex matching in PostgreSQL at PostgreSQL Docs
Posted on August 6, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 30, 2024