Binary Search
James Robb
Posted on April 26, 2020
Binary search is a blazingly-fast search algorithm that has a worst case Big O of Ο(log n)
, the average case is O(log n)
and the best case is O(1)
. In short it's a search algorithm that is about as fast as you can get.
Binary Search utilises divide and conquer to reduce much of the potential runtime. In fact, it is a lot like merge sort in how it functions but of course one algorithm is used to sort data and this one is used to find data.
For the algorithm to run properly, the data must be pre-sorted and be have elements of the same or similar types held within. So if your collection contains strings and integers, it won't work, you must have elements using a consistent data type withinin your collection.
When the binary search is executed, it will take an input collection
and a value
to find and if the value exists we will return the index of that value
within the collection
and if it doesn't we will return -1
out of convention to represent that the value
doesn't exist in the collection
after all.
Tests
from main import binarySearch;
def test_strings():
assert binarySearch(["Dave", "James"], "Dave") == 0;
assert binarySearch(["Dave", "James"], "abc") == -1;
def test_integers():
assert binarySearch([1, 2, 3,4], 3) == 2;
assert binarySearch([1, 2, 3, 4], 5) == -1;
def test_floats():
assert binarySearch([3.14], 3.14) == 0;
assert binarySearch([3.141], 3.14) == -1;
def test_booleans():
assert binarySearch([False, True], True) == 1;
assert binarySearch([True], False) == -1;
def test_lists():
assert binarySearch([[3.14]], [3.14]) == 0;
assert binarySearch([[3.141]], [3.14]) == -1;
def test_nested_lists():
assert binarySearch([[3.14, [1]]], [3.14, [1]]) == 0;
assert binarySearch([[3.141]], [3.14]) == -1;
def test_sets():
assert binarySearch([set([1])], set([1])) == 0;
assert binarySearch([set([1])], set([1,2])) == -1;
def test_tuples():
assert binarySearch([tuple([0,2]), tuple([1])], tuple([1])) == 1;
assert binarySearch([tuple([1])], tuple([1, 2])) == -1;
In python, as with many other languages, some elements can't compare with one another in a non-hacky manner, even if they are both the same type. An example of this is that types such as dict
cannot run comparisons like {'a': 1} > {'a': 1}
.
For binary search we do need to use comparison operators like that though and so I represented only the types that are comparible with the ==
, >
, >=
, <
and <=
operators that binary search utilises.
Here is a repl to run and play with the tests:
Implementation
There are a few ways to skin this particular cat. For that reason, I will show you two implementations for the Binary Search algorithm. One will be using an imperative approach and another using Recursion.
Both implementations expect uniform data and are typed for this reason. We expect data to contain consistent types in the input collection so that the search is guaranteed to be able to compare the items in the collection against one another. If we didn't do this we may get unexpected errors such as this:
TypeError: '>' not supported between instances of 'list' and 'str'
Just remember to normalise and sort your data before you begin using the implementations that will be outlined below or if you decide to build your own implementation in the future as these traits are a requirement of the algorithm.
The imperative approach
Imperative programming is a programming paradigm which describes how a programme should run. This paradigm uses statements to alter a programmes state and generate values.
from typing import List, TypeVar;
from operator import eq as deep_equals, gt as greater_than, ne as not_equal, le as less_than_or_equal;
from math import floor;
T = TypeVar('T');
def binarySearch(collection: List[T], value: T) -> int:
start = 0;
end = len(collection) - 1;
middle = floor((start + end) / 2);
while(not_equal(collection[middle], value) and less_than_or_equal(start, end)):
if greater_than(value, collection[middle]): start = middle + 1;
else: end = middle - 1;
middle = floor((start + end) / 2);
return middle if deep_equals(collection[middle], value) else -1;
We begin by importing some helpers from the python standard library which has a whole swath of helper functions for us to use in our code.
A point worth noting is the use of
TypeVar
from thetyping
module to align our input types for thecollection
andvalue
.All this means is that when we recieve the collection, it should have items of consistent type
T
and thevalue
we want to search for should also be of typeT
.So if we have the
collection
of typeList[str]
for example then value must also be anstr
but instead of hard-coding this, we letTypeVar
do the work for us.
Our binarySearch
function will recieve a collection
to search within and a value
to look for in that collection
. If the value
is found then we return it's location (index) in the collection
.
To find the index we setup the start
, end
and middle
indexes of the collection
. While the value
is not equal to the value in the middle
of the collection
and the start
index is less than or equal to the end
index, we check if the value
is bigger than the middle
value, if it is, we set the start
index to begin on the next iteration from the middle
, otherwise we set the end to be the middle
.
We do this because if the
value
is greater than the value in themiddle
of thecollection
, thevalue
must be on the right half of the collection and vice versa for the opposite case.
Either way the condition falls we will then reset the middle
index to be in between the new start
and end
indexes. We continue with this process until the condition on the while loop is no longer true.
When the while loop exits, we check if the middle
indexes value is the value
we are looking for. If it is, we return the middle
index and if it isn't then the value
wasn't in the input collection and we return -1
to represent this case.
The recursive approach
from typing import List, TypeVar, Union;
from operator import eq as deep_equals, gt as greater_than, ne as not_equal, le as less_than_or_equal, ge as greater_than_or_equal;
from math import floor;
T = TypeVar('T');
def binarySearch(collection: List[T], value: T, start: int = 0, end: Union[int, None] = None) -> int:
if deep_equals(end, None):
end = len(collection) - 1;
if greater_than_or_equal(end, start):
middle = floor((start + end) / 2);
if deep_equals(collection[middle], value):
return middle;
if greater_than(collection[middle], value):
return binarySearch(collection, value, start, middle - 1);
return binarySearch(collection, value, middle + 1, end);
return -1;
Exactly like the imperative approach outlined above, we begin by importing some helpers from the python standard library which has a whole swath of helper functions for us to use in our code.
A point worth noting is the use of
TypeVar
from thetyping
module to align our input types for thecollection
andvalue
.All this means is that when we recieve the collection, it should have items of consistent type
T
and thevalue
we want to search for should also be of typeT
.So if we have the
collection
of typeList[str]
for example then value must also be anstr
but instead of hard-coding this, we letTypeVar
do the work for us.
The recursive version naturally calls itself again and again and requires a couple more parameters to be provided on each call than the imperative approach does. Namely we need to know where to the start
index and end
index are for the current recursive call of the binarySearch
function.
Initially the start
will be at 0
which makes sense since we want to begin at the start of the collection
and the end
initially will be set to None
but will be set to the length of the collection
mins one once the function first runs.
Note: We set
end
to initialise itself with a value ofNone
because in python you can't reference other parameters to use as default values of other parameters. To get around this, we set it's type toUnion[None, int]
which just means this can be of typeNone
orint
. Once the function first executes we set theend
value to the length of thecollection
mins one because if acollection
contains 5 items the last index would be 4 since lists/arrays are always 0 indexed.
If the end
index is greater than or equal to the starting index we calculate where the middle
index of the current collection
is and consider 3 cases:
- If the item at the
middle
index equals the value we are looking for, return themiddle
index - If the item at the
middle
index is greater than the value we are looking for then run a recursive call tobinarySearch
to check the left half of thecollection
for thevalue
- If the item at the
middle
index is less than the value we are looking for then run a recursive call tobinarySearch
to check the right half of thecollection
for thevalue
If none of these recursive calls manage to find the value
in the collection
then we return -1
by default.
Conclusions
The Binary Search algorithm is always a fantastic choice to find values within sorted and type-consistent data sets.
Personally I try to always keep data within applications as uniform as possible and thus, I find myself using this algorithm relatively often in my day to day work. I highly recommend looking into it further and seeing how it can benefit you and your work too!
I hope you learned something new today and if you have any questions, ideas or comments, feel free to comment them below!
Posted on April 26, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.