Search optimized database (full-text search) for system design interview

dleedev365

Daniel Lee

Posted on October 16, 2024

Search optimized database (full-text search) for system design interview

Traditional databases run a table scan to find a search term in the database. This is slow and efficient if a table stores a large dataset (1000+ rows). To improve, "search optimized database" can be used.

It uses indexing, tokenization and stemming to make search queries fast and efficient (by building "inverted indexes"):

  • Tokenization is a process of reducing words to their root form. For example, "running" and "runs" can be reduced to "run".
  • Stemming is a process of breaking a piece of task into individual words. It helps mapping words to documents containing those words in the inverted indexes.

Something to note is the underlying data structure of search mechanisms of search optimized database - Inverted Indexes.

It is a data structure that maps words to the documents that contain them. For example:

{
  "word1": [doc1,doc2,doc3],
  "word2": [doc2,doc3,doc6]
}
Enter fullscreen mode Exit fullscreen mode

Most search optimized database also support "Fuzzy Search" out of box as a configuration. Fuzzy Search works by leveraging "edit distance calculation" technique, which measures how many letters to be changed/added/removed to transform one word into another. Thus, results with minor misspellings or discrepancy relative to the search term can be returned efficiently in case of human errors.

One of the popular search optimized database is "ElasticSearch".


Reference

https://www.hellointerview.com/learn/system-design/in-a-hurry/key-technologies

💖 💪 🙅 🚩
dleedev365
Daniel Lee

Posted on October 16, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related