Whiteboarding in Python: Can You Solve This Simple String Problem?

pythonwb

Whiteboarding in Python

Posted on October 12, 2020

Whiteboarding in Python: Can You Solve This Simple String Problem?

<< Week 4: Binary Search | View Solution on Github | Week 6: Misc 3 >>

worm on a string toys
(Image Source: Buzzfeed) Despite everything, I'm still just a worm on a string.

Happy Monday, and welcome back to Whiteboarding in Python. I hope last week I didn't scare you away with algorithms and math, so this week, we'll look at an easier problem involving strings. Let's jump right in.

# Determine if a string has all unique characters. 
# You cannot use additional data structures. 
Enter fullscreen mode Exit fullscreen mode

Pretty simple, right? That's because it's the first question listed in the "Arrays and Strings" section of Cracking the Coding Interview. There are many different approaches we can take that differ in time and space complexity, so we'll walk through a couple of them.

First Approach: Brute Force

The brute force is usually the simplest way to solve any whiteboarding problem, so I find it helpful to come up with that solution first before moving onto more sophisticated methods. At least then, we have a solution to fall back on if the fancier methods don't work. And, it's good practice for computer science fundamentals. Reading through this solution will help give some clarity about how to start approaching any problem you may come across.

"Brute force" is exactly as it sounds. Imagine you had a jigsaw puzzle where all the pieces were the same color. You'd have to blindly try every combination until you found two that fit together. Here, we're going to check every character to every other character in the string to see if it's a match. Let's start by defining the method (either in Repl.it, or you could try your hand at old fashioned pencil and paper). It will take one string as a parameter, which we'll call s.

def is_unique(s):
    pass
Enter fullscreen mode Exit fullscreen mode

Next, we'll define the first forloop. There will be nested loops, and this one will loop through every character in the string until the second to last. Why not until the last character? We'll be comparing each character to the ones following it, and there are none after the last character in the list.

def is_unique(s):
  for i in range(len(s) - 1):
Enter fullscreen mode Exit fullscreen mode

Now let's make a for loop inside that one. It will starts with the next spot in the list, which we'll call index j. Index j will traverse the list until the end, comparing it to the character at the first index, i. We'll return False if they match, meaning the characters in the string are not unique.

def is_unique(s):
  for i in range(len(s) - 1):
    for j in range(i + 1, len(s)):
      if s[j] == s[i]:
        return False
Enter fullscreen mode Exit fullscreen mode

Finally, we need to return True somewhere. If the nested for loop successfully compares each letter together and never finds a match, that means the strings are unique. Thus, we'll return True outside the nested loop.

def is_unique(s):
  for i in range(len(s) - 1):
    for j in range(i + 1, len(s)):
      if s[j] == s[i]:
        return False
  return True
Enter fullscreen mode Exit fullscreen mode

And that's it! Calling is_unique("Hello") should return False, and is_unique("Bye") should return True.

Time and Space Complexity of the Brute Force Method

For one, our solution fills the requirement of "no additional data structures." We simply loop through the string without saving information to a new data structure. That gives us a space complexity of O(1), having no relation to the length of the string.

How about time complexity? To imagine the worst case, there are no unique characters, so the entirety of the nested loop has to run. This we can assume to be about O(N2), although we saved some time by checking each pair only once. I did the math, and it's technically O( ∑N-1i=1 i ), but this is not a typical time complexity, so for now we'll say it's O(N2). If you remember the graph from last week, O(N2) is pretty terrible. However, this is about the best we can do if we can make no new additional data structures or modify the original string.

More Optimal: Sort the String

Now that have brute force out of the way, let's try to come up with some more elegant solutions. We started to learn about searching and sorting last week, so let's think about how we could use those concepts here. If we sort the string, we could loop over every character and make sure it doesn't match its previous character. Let's get started.

The Python .sort() method only works on lists. So, our first task will be to turn the string into a list of characters. I'll reiterate, but for whiteboarding problems in Python, this is really something you'll want to know by heart.

s_as_list = [char for char in s]
Enter fullscreen mode Exit fullscreen mode

The for loop iterates each character in the string s, and returns that character. We cast the final product to a list by wrapping it in square brackets (a couple of square boys, as my students call them).

Once we turn the string into a list, we can call sort() on it.

def is_unique2(s):
  s_as_list = [char for char in s]
  s_as_list.sort()
Enter fullscreen mode Exit fullscreen mode

Now, we can loop through the list. We'll compare each letter to the last, and we can do this in a couple of ways. We can iterate over each index, starting at index 1 (if it exists) until the end, or, if we're not keeping track of the index, we can save the previous letter to a variable. We can initialize it to an empty string.

    prev = ""
    for letter in s_as_list:
Enter fullscreen mode Exit fullscreen mode

For each iteration, we want to do one of two things:

  1. If the character is the same as the previous one, return False.
  2. Otherwise, make that character the new previous character.

We can turn this into an if statement. I'm using the variable name letter instead of char just so we don't confuse it with where we used it earlier, when we made the string into a list.

  prev = ""
  for letter in s_as_list:
    if letter == prev:
      return False
    else: 
      prev = letter
Enter fullscreen mode Exit fullscreen mode

Finally, if the for loop was executed successfully without finding a match, we'll return True outside of the loop. Altogether:

def is_unique2(s):
  s_as_list = [char for char in s]
  s_as_list.sort()
  prev = ""
  for letter in s_as_list:
    if letter == prev:
      return False
    else: 
      prev = letter
  return True
Enter fullscreen mode Exit fullscreen mode

Time Complexity for Method Using Sort

The complexity of our method depends on the time complexity of Python's .sort() method. Python uses Timsort, which is a hybrid between a merge and insertion sort, if you're familiar with those. The average and worst case complexity is O(N log N). Additionally, we loop through the list N times, although we neglect this since O(N log N) is larger. However, since we have to make a new list to sort the string, we no longer fill the requirement that additional data structures cannot be used.

Even More Optimal Time Complexity: Use a Dictionary

Let's throw out the "no additional data structures" rule altogether. What solution could we come up with? In Cracking the Coding Interview, the first solution suggests using an array of length 128, the lenght of the Unicode alphabet. However, if you've used Python once or twice, by now you probably know to use a dictionary.

In our third week, we used the Python library default dictionary. If you remember, defaultdict allows us to set a default type for values contained in a dictionary, so if we call a key that doesn't exist, it will give us the falsey version of that type. This saves us the hassle of having to check if a key is in the list first before checking its value. We can set our dictionary dd to type bool so that the default value is always False.

from collections import defaultdict

def is_unique3(s):
  dd = defaultdict(bool)
Enter fullscreen mode Exit fullscreen mode

Next, let's write the for loop. We'll iterate over every character in the string, and if its value is not False, meaning that character has been seen before, we'll return False. Otherwise, we'll set the dictionary value at that character to be True.

  for char in s:
    if dd[char]:
      return False
    dd[char] = True
Enter fullscreen mode Exit fullscreen mode

Finally, we'll return True outside of the loop if every character appears only once. Altogether:

def is_unique3(s):
dd = defaultdict(bool)
for char in s:
if dd[char]:
return False
dd[char] = True
return True
Enter fullscreen mode Exit fullscreen mode




Time Complexity for the Dictionary Method

What is our time complexity? We loop through the list N times, and it takes O(1) to access each item in the dictionary, so our complexity is O(N), even better than our previous approach. However, now we have a dictionary in our storage that wasn't there before. You'll find that time and space are tradeoffs when it comes to computer science, and making the best decision depends on the situation.

That's all for today, see you next week! If you have questions, or any ideas for what problem to tackle next, feel free to leave a comment.

<< Week 4: Binary Search | View Solution on Github | Week 6: Misc 3 >>

Sheamus Heikkila is formerly a Teaching Assistant at General Assembly Seattle. This blog is not associated with GA.

💖 💪 🙅 🚩
pythonwb
Whiteboarding in Python

Posted on October 12, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Heap sort algorithm
algorithms Heap sort algorithm

July 5, 2021

40 Python Projects ideas
python 40 Python Projects ideas

April 25, 2021

Road to Genius - Python Series level 3
beginners Road to Genius - Python Series level 3

January 20, 2021

Road to Genius - Python Series level 2
beginners Road to Genius - Python Series level 2

January 15, 2021

Road to Genius - Python Series level 1
beginners Road to Genius - Python Series level 1

January 10, 2021