Solving a string prefix problem using Trie in golang

gnsp

Ganesh Prasad

Posted on June 21, 2022

Solving a string prefix problem using Trie in golang

A few days ago we conducted a learning session on golang for my team at work. It was a short session covering the basics of the language. Since then, I had been thinking of writing a series of blogposts/tutorials about using these basic concepts to solve real algorithmic problems using golang. Today (Sunday, 19th June 2022) I came across this interesting problem about finding matching strings from a list of strings for a given prefix while browsing through some old folders on my laptop. I had solved this problem in Javascript some 3 years ago; today I decided to give it a 'Go' !

The problem

This problem is available on leetcode, here is the link.

Problem statement

You are given an array of strings products and a string searchWord.

Design a system that suggests at most three product names from products after each character of searchWord is typed. Suggested products should have common prefix with searchWord. If there are more than three products with a common prefix return the three lexicographically minimum products.

Return a list of lists of the suggested products after each character of searchWord is typed.

Constraints

  • 1 <= products.length <= 1000
  • 1 <= products[i].length <= 3000
  • 1 <= sum(products[i].length) <= 2 * 10^4
  • All the strings of products are unique.
  • products[i] consists of lowercase English letters.
  • 1 <= searchWord.length <= 1000
  • searchWord consists of lowercase English letters.

Example testcases

Example 1



Input: products =["mobile","mouse","moneypot","monitor","mousepad"], searchWord = "mouse"
Output: [
  ["mobile","moneypot","monitor"],
  ["mobile","moneypot","monitor"],
  ["mouse","mousepad"],
  ["mouse","mousepad"],
  ["mouse","mousepad"]
]


Enter fullscreen mode Exit fullscreen mode

Example 2



Input: products = ["havana"], searchWord = "havana"
Output: [["havana"],["havana"],["havana"],["havana"],["havana"],["havana"]]


Enter fullscreen mode Exit fullscreen mode

Example 3



Input: products = ["bags","baggage","banner","box","cloths"], searchWord = "bags"
Output: [["baggage","bags","banner"],["baggage","bags","banner"],["baggage","bags"],["bags"]]


Enter fullscreen mode Exit fullscreen mode

Solving the problem

This problem is about finding matching strings from a list of strings products for each of the prefixes obtained from searchWord taking one character at a time and appending it to the previous prefix. For example, if the searchWord is "cat", then the prefixes for which we need to find matching strings will be ["c", "ca", "cat"]. For each of these prefixes we are to find 3 matching strings (in lexicographical order, ascending) from the array of strings products.

This problem can be solved efficiently by using the datastructure Trie aka Prefix tree; by efficiently I mean with the worst case time conplexity of O(M+n) where M is the number of characters in the array products and n is the length of searchWord. Before we start implementing the solution, let's have a quick overview of the Trie datastructure.

Trie

A trie is a k-ary search tree. It stores multiple words/strings by taking each cahracter as a node. If multiple words have the same prefix then they follow the same path from the root of the trie until the end of the common prefix. For example, if the list of words is ["CAT", "CAN", "BAT", "BAG"] then the corresponding trie can be visualised as the following diagram.

trie

Here the root node stands for the prefix empty string. As the empty string is a valid prefix for all strings in the list, all of them can be matched from the root node. Then we have the words ["CAT", "CAN"] that have a common prefix "C" and the words ["BAT", "BAG"] that have a common prefix "B". Therefore the root node will have 2 children nodes B and C. Similarly each of the children nodes B and C will have one child node each, standing for A; at this point the prefixes for the respective paths would be "BA" and "CA" and so on.

You can read more about Tries here.

Implementing the solution

In this problem, we want to store only 3 matching words (sorted in lexicographical order) from products at each node. So, each node will have an array of 3 strings to store the matches and a list of pointers to its children.

Let's define this trie structure in go. To make things more space efficient, rather than storing the actual matching strings in each node, we can store their indices in the products array and then look up the actual strings using these indices when we need to output the actual strings. The lexicographical ordering can be easily handled by sorting the products array once. (Think why ?) We can create a struct type to define each node of the trie like the following:



type trie struct {
  words [3]int
  wordsLen int
  children map[string]*trie
}


Enter fullscreen mode Exit fullscreen mode

Here children is a map from strings to pointers to child nodes of the trie node. A tree is a recusrive datastructure, and it's efficient to strore pointers to child nodes on each node of the trie. The wordsLen field is to keep track of how many entries in the words array are valid indices. In golang, int values default to 0, and 0 can be a valid index in the products array even when the 0th string in the products is not a matching string for the node. Therefore it's necessary to keep track of where the valid indices end inside the words array.

As the trie struct has a map field declared inside it and we can not use it before initializing the map, let's also write a helper function to create a trie node, initialize the map and return a pointer to the node.



func newTrie () *trie {
  var t trie

  // initialize the map
  t.children = make(map[string]*trie)

  // return the pointer to the trie node
  return &t
}


Enter fullscreen mode Exit fullscreen mode

Now that we have the trie structure in place, let's write a method to insert words into it. If a word is among the 3 lexicographically smallest matching strings for a node, then we need to store its index in the words array of that node (and increment the wordsLen field). If we have already stored the 3 lexicographcally minimum matching strings for a node, then we do not need to store the word in that node and we can move to its child nodes recursively.

The method needs to take the word, its index in the products array and the current character of the word being inserted into the trie node as arguments. If the trie node does not have a child for the current character, then we need to create a trie node and add it to the children map against the current character.



// Here index is the index of the word in products
// and pos is the index of the current character being added to the trie

func (t *trie) insertWord (word string, index int, pos int) int{

  // if we have not found 3 matching strings yet, then store the word (index)
  if t.wordsLen < 3 {
    t.words[t.wordsLen] = index
    t.wordsLen += 1
  }

  // if we have reached the end of the word, then do nothing else and return
  if pos == len(word) {
    return 0
  }

  // get the current character and check if there is a child node for that character
  // if not, then create a new trie node and add it to children
  ch := string(word[pos])
  if t.children[ch] == nil {
    t.children[ch] = newTrie()
  }

  // recursively insert the word in the corresponding child node
  t.children[ch].insertWord(word, index, pos+1)
  return 0
}


Enter fullscreen mode Exit fullscreen mode

Now that we have the trie structure and the method to insert strings in it, let's write the function to find matching strings for the prefixes. This function will take the array of strings products and the string searchWord as arguments and return a slice of slice of strings with the matching products for each of the prefixes of the searchWord.

First we need to sort the slice products lexicographically. We can use the native package sort to do this. Then we need to initialize the trie and insert all words in products into it.

Then we can traverse the trie from root, taking one character at a time from the searchWord. We will initialize the current trie node to root and mismatched flag to false. For each chacater of the searchWord we need to check if the current node has a child corresponding to the character. If there is a child node, then we have a match and we add the words array of that node to the return slice and set the child node as the current node. If there is no child node corresponding to the character, then we have a mismatch and all subsequent entries in the return slice will be empty, we will set the mismatched flag to true to keep track of this. This is a standard depth first traversal of the trie and it can be implemented like the following.



func suggestedProducts(products []string, searchWord string) [][]string {

  // sort the products slice
  sort.Strings(products)

  // initialize the trie and insert all words in products into it
  root := *newTrie()
  for i, word := range products {
    root.insertWord(word, i, 0)
  }

  // declare the return value, the mismatched flag and current trie node to root
  var ret [][]string
  mismatched := false
  t := root

  // for each character of the searchWord traverse the trie and find matching words
  // if we are unable to find any matches at any point then we can set the mismatched
  // flag to true and all subsequent matches will be empty slices.

  for _, ch := range searchWord {
    ch := string(ch)
    if t.children[ch] == nil || mismatched {
      list := make([]string, 0)
      ret = append(ret, list)
      mismatched = true
      continue
    }

    t = *t.children[ch]
    var list []string
    var i int
    for i=0; i<t.wordsLen; i+=1 {
      list = append(list, products[t.words[i]])
    }
    ret = append(ret, list)
  }
  return ret
}


Enter fullscreen mode Exit fullscreen mode

This concludes our solution. But we need to test it against the example test cases to be sure that it's indeed a correct solution for the given problem.

Testing

In golang, testing is a really simple process. Go standard library has the testing package to help us test and benchmark our implementations with ease. Go automatically detects test files from the filenames; if a filename ends with _test.go then go considers that file to be a test file. Inside a test file, the functions that start with Test and take one argument of the type *testing.T are run by the test runner automatically. Similarly functions that start with Benchmark and take one argument of the type *testing.B are run for benchmarking. In case of our solution, we are focused on the unit testing of our solution. So we can write a function TestSuggestedProducts and add our testcases
inside it. We can use the standard library package reflect to check if the output and expected slices are deeply equal. The following is the complete code for this problem along with the test function.



package dsproblems

import (
  "sort"
  "testing"
  "reflect"
  "fmt"
)

type trie struct {
  words [3]int
  wordsLen int
  children map[string]*trie
}

func newTrie () *trie {
  var t trie
  t.children = make(map[string]*trie)
  return &t
}

func (t *trie) insertWord (word string, index int, pos int) int{
  if t.wordsLen < 3 {
    t.words[t.wordsLen] = index
    t.wordsLen += 1
  }

  if pos == len(word) {
    return 0
  }

  ch := string(word[pos])
  if t.children[ch] == nil {
    t.children[ch] = newTrie()
  }
  t.children[ch].insertWord(word, index, pos+1)
  return 0
}

func suggestedProducts(products []string, searchWord string) [][]string {
  sort.Strings(products)

  root := *newTrie()
  for i, word := range products {
    root.insertWord(word, i, 0)
  }

  var ret [][]string
  mismatched := false
  t := root
  for _, ch := range searchWord {
    ch := string(ch)
    if t.children[ch] == nil || mismatched {
      list := make([]string, 0)
      ret = append(ret, list)
      mismatched = true
      continue
    }

    t = *t.children[ch]
    var list []string
    var i int
    for i=0; i<t.wordsLen; i+=1 {
      list = append(list, products[t.words[i]])
    }
    ret = append(ret, list)
  }
  return ret
}

func TestSuggestedProducts(t *testing.T) {
  products := []string{"mobile","mouse","moneypot","monitor","mousepad"}
  searchWord := "mouse"
  output := [][]string{
    {"mobile","moneypot","monitor"}, 
    {"mobile","moneypot","monitor"}, 
    {"mouse","mousepad"}, 
    {"mouse","mousepad"}, 
    {"mouse","mousepad"},
  }
  fmt.Println(output)
  if !reflect.DeepEqual(suggestedProducts(products, searchWord), output) {
    t.Fatalf("Failed testcase #1")
  }

  products = []string{"havana"}
  searchWord = "havana"
  output = [][]string{
    {"havana"}, 
    {"havana"}, 
    {"havana"}, 
    {"havana"}, 
    {"havana"}, 
    {"havana"},
  }
  fmt.Println(output)
  if !reflect.DeepEqual(suggestedProducts(products, searchWord), output) {
    t.Fatalf("Failed testcase #2")
  }

  products = []string{"bags","baggage","banner","box","cloths"}
  searchWord = "bags"
  output = [][]string{
    {"baggage","bags","banner"}, 
    {"baggage","bags","banner"}, 
    {"baggage","bags"}, 
    {"bags"},
  }
  fmt.Println(output)
  if !reflect.DeepEqual(suggestedProducts(products, searchWord), output) {
    t.Fatalf("Failed testcase #3")
  }
}


Enter fullscreen mode Exit fullscreen mode

So, that's that and we just solved the problem using the trie datastructure in golang. Also, here is an implementation of the same solution in Javascript for reference.



class Trie {
  constructor(words=[], children={}) {
    this.words = words;
    this.children = children;
  }

  insertWord(word, i, j) {
    if (this.words.length < 3)
      this.words.push(i);

    if (j === word.length)
      return;

    const ch = word.charAt(j);
    if (this.children[ch] === undefined) {
      this.children[ch] = new Trie();
    }
    this.children[ch].insertWord(word, i, j+1);
  }
}

function suggest(words, search) {
  const sortedWords = [...words];
  sortedWords.sort((a, b) => a.localeCompare(b));

  const trie = new Trie();
  sortedWords.forEach((word, i) => {
    trie.insertWord(word, i, 0);
  });

  return search.split('').reduce((acc, ch) => {
    if (acc.t.children[ch] === undefined || acc.done) {
      acc.ret.push([]);
      acc.done = true;
      return acc;
    }

    acc.t = acc.t.children[ch];
    acc.ret.push(acc.t.words.map(ind => sortedWords[ind]));
    return acc;
  }, {ret: [], t: trie, done: false}).ret;
}


Enter fullscreen mode Exit fullscreen mode

Hoping you had a nice read and learned something new from this article. Have nice days ahead. :)

💖 💪 🙅 🚩
gnsp
Ganesh Prasad

Posted on June 21, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related