Python’s itertools: A Hidden Gem for Efficient Looping
Essi Alizadeh
Posted on June 8, 2023
Outline
- Introduction
- What is an iterator in Python?
- A deep dive into itertools library
- Conclusion
- References
Introduction
The itertools [1] module in Python is a powerful tool that provides a set of functions for creating iterators to support efficient looping and handling of sequences.
It's part of Python's standard library, meaning it's available in every Python installation.
Let's first talk about what a Python iterator is before diving into the itertools functions.
What is an iterator in Python?
An iterator is a Python object that can be looped over, or iterated.
Data containers may be abstracted in order to get access to and perform operations on their contents without revealing their internal representation.
Python has several built-in functions and objects that return iterators.
Some of the more frequent ones are as follows:
- Basic data types: Lists, tuples, strings, and dictionaries,
- Built-in functions:
range()
,enumerate()
,zip()
How is an iterator defined in Python?
An iterator object must implement two special methods: __iter__()
and __next__()
, collectively known as the iterator protocol [2].
The __iter__()
method returns the iterator object itself, and is required for your object to be used in any iteration context, such as a for loop.
The __next__()
method returns the next value from the iterator.
If there are no more items to return, it should raise StopIteration
.
class CountUpToThree:
def __init__(self):
self.count = 0
def __iter__(self):
return self
def __next__(self):
if self.count < 3:
value = self.count
self.count += 1
return value
else:
raise StopIteration
counter = CountUpToThree()
for c in counter:
print(c)
0
1
2
A deep dive into itertools library
At its core, itertools offers a suite of building block functions that allow you to iterate over data in a fast, memory-efficient, and developer-friendly way.
These functions can be categorized into three broad types:
- Infinite Iterators: These generate an infinite sequence of values.
- Combinatoric Generators: These iterators generate outputs by combining inputs in different ways. They are extremely useful when you want to produce complex combinations or permutations of data.
-
Iterators Terminating on the Shortest Input Sequence: These, like
itertools.zip_longest()
,itertools.chain()
,itertools.takewhile()
, produce values from input sequences and stop when the shortest sequence is exhausted.
All iterators in Python output values sequentially, but itertools' operations may be chained together to construct more complicated iterators that can process big data sets without using a lot of memory.
Additionally, because itertools' operations are written in C, they are faster than comparable Python code written using conventional loops.
Itertools is a useful tool for Python programmers because it makes loops more efficient and the code easier to read.
Itertools gives us a better way to run through lists, texts, dictionaries, files, and even our own custom data structures.
Infinite Iterators
Infinite iterators are a unique feature in the itertools module.
They produce an endless sequence of items, only stopping when we explicitly break the loop.
This can be particularly useful in scenarios where we have a repeating pattern or want to generate a continuous sequence.
However, you must be careful when using these to avoid creating an infinite loop in your program.
Let's look at the three main infinite iterator functions: count()
, cycle()
, and repeat()
.
count(start, step)
The count()
function works similarly to the built-in range()
function but, instead of stopping at a certain point, it continues indefinitely.
It takes two arguments: start
and step
.
start
is the number at which the count begins, and step
is the increment.
from itertools import count
for idx in count(start=100, step=5):
print(idx)
if idx > 110: # Break the loop to prevent an infinite loop
break
100
105
110
115
In this example, we start counting from 100 and increase by 5 each time.
The loop will continue indefinitely unless we stop it.
Here, we stop it when i
gets larger than 110.
cycle(iterable)
The cycle()
function cycles through an iterable indefinitely. This can be useful when you have a repeating pattern.
from itertools import cycle
count = 0
for item in cycle("ABC"):
print(item)
count += 1
if count >= 5: # Break the loop to prevent infinite loop
break
A
B
C
A
B
In this example, we're cycling through the string 'ABC'.
Once we reach 'C', it starts over with 'A' again.
We stop the loop after 5 iterations.
More advanced example: Cycle through a list
Suppose we want to cycle through a list indefinitely and print out the current item and the next item.
from itertools import cycle
items = ["A", "B", "C"]
cycled_items = cycle(items) # an iterator that returns elements from the iterable indefinitely
current_item = next(cycled_items) # to advance through the iterator
for _ in range(5):
next_item = next(cycled_items)
print(f"Current item: {current_item}\nNext item: {next_item}\n")
current_item = next_item
Current item: A
Next item: B
Current item: B
Next item: C
Current item: C
Next item: A
Current item: A
Next item: B
Current item: B
Next item: C
repeat(object, times)
The repeat()
function simply repeats an object over and over again.
By default, it does this indefinitely, but you can also specify the number of times you want the object to be repeated.
from itertools import repeat
for i in repeat(["A", "B"], times=3):
print(i)
print("\n")
for i in repeat("AB", times=3):
print(i)
['A', 'B']
['A', 'B']
['A', 'B']
AB
AB
AB
Here, we're repeating the string 'ABC' three times.
Unlike the previous functions, repeat()
can terminate on its own if we provide the times
argument.
These functions can be very handy in various scenarios.
They allow us to generate data on the fly without having to pre-generate large lists or sequences, making our code more memory efficient.
Combinatoric Iterators
Combinatoric iterators are used to create different types of iterators that generate all possible combinations, permutations, or Cartesian products (a set of all ordered pairs) of an iterable.
They are powerful tools when we need to consider all possible combinations of elements.
Here we'll focus on three functions: product()
, permutations()
, and combinations()
.
product(iterable, repeat)
The product()
function computes the Cartesian product of the input iterable. This is equivalent to nested for-loops.
The repeat
argument specifies the number of repetitions of the iterable.
The result is the Cartesian product of the input iterable with itself, repeated the specified number of times.
from itertools import product
for item in product(["A", "B"], repeat=2):
print(item)
('A', 'A')
('A', 'B')
('B', 'A')
('B', 'B')
In this example, we're generating the Cartesian product of the string 'AB' with itself. This gives us all possible pairs of 'A' and 'B' in a tuple.
permutations(iterable, r)
The permutations()
function generates all possible permutations of the input iterable. You can specify the length of the permutations using the 'r' argument. If 'r' is not specified, then 'r' defaults to the length of the iterable.
from itertools import permutations
for item in permutations("ABC", r=2): # equivalent to permutations(["A", "B", "C"], 2)
print(item)
('A', 'B')
('A', 'C')
('B', 'A')
('B', 'C')
('C', 'A')
('C', 'B')
Here, we're generating all possible 2-element permutations of the string 'ABC'.
Each permutation is a tuple of two characters.
combinations(iterable, r)
The combinations()
function generates all possible combinations of the input iterable.
The r
argument specifies the length of the combinations. Unlike permutations, combinations don't consider the order of elements.
from itertools import combinations
for item in combinations(["A", "B", "C"], r=2):
print(item)
('A', 'B')
('A', 'C')
('B', 'C')
Here, we'll generate every pairwise permutation of the items in the list ["A", "B", "C"].
These operations come in handy when trying to solve a problem that requires us to think about every conceivable combination or subset of the given items.
Terminating Iterators
Functions that return a single iterable after using up all elements in the input iterable are called terminating iterators.
They are used to reduce the input iterable in some way.
For this section, we'll focus on accumulate()
, groupby()
, and chain()
.
accumulate(iterable, func)
The accumulate()
function provides a way to get the sum of values or the sum of the outcomes of other binary operations.
In the absence of a specified function, addition will be used.
from itertools import accumulate
list_ = [3, 4, 6, 2, 1, 9, 8]
for item in accumulate(list_, func=max):
print(item)
# accumulate([3], func=max) -> 3
# accumulate([3, 4], func=max) -> 4
# accumulate([3, 4, 6], func=max) -> 6
# accumulate([3, 4, 6, 2], func=max) -> 6
# accumulate([3, 4, 6, 2, 1], func=max) -> 6
# accumulate([3, 4, 6, 2, 1, 9], func=max) -> 9
# accumulate([3, 4, 6, 2, 1, 9, 8], func=max) -> 9
3
4
6
6
6
9
9
In this example, we're using accumulate()
with the max function to print the maximum value encountered at each step in the list.
groupby(iterable, key)
The groupby()
function makes an iterator that returns consecutive keys and groups from the iterable. The key is a function that computes a key value for each element.
from itertools import groupby
list_ = [
("apple", "fruit"),
("orange", "fruit"),
("lettuce", "vegetable"),
("spinach", "vegetable")
]
for key, group in groupby(list_, key=lambda x: x[1]):
print(f'"{key}" group: ', list(group))
"fruit" group: [('apple', 'fruit'), ('orange', 'fruit')]
"vegetable" group: [('lettuce', 'vegetable'), ('spinach', 'vegetable')]
In this case, we're classifying a set of tuples according to their second element (thus, x[1]
), which makes them either fruit or vegetable.
chain(iterables)
The chain()
function is used to treat multiple sequences as one continuous sequence.
from itertools import chain
list_1 = ["A", "B"]
list_2 = [1, 2, 3]
s = "cd"
for each in chain(list_1, list_2, s):
print(each)
A
B
1
2
3
c
d
In this example, we're using chain()
to treat three separate lists as if they were one long list and iterating over their contents.
Conclusion
In conclusion, the itertools module is a hidden gem in Python that enables simpler, more efficient code to be written when dealing with iterations.
It simplifies our work by providing a set of tools for building and manipulating iterators that can handle complicated iteration patterns.
As we deal with bigger datasets, efficiency in terms of memory use also becomes more crucial.
In this post, we covered three main classes of itertools methods, which are: 1. infinite iterators, 2. combinatoric iterators, and 3. terminating iterators.
Despite its benefits, itertools is still one of Python's lesser-known standard libraries.
itertools is a necessary element of every Python programmer's arsenal because of the variety of powerful capabilities it offers for looping, iterating, and producing combinations or permutations.
Learning itertools is a good investment of time, whether you're an experienced Pythonista wanting to hone your coding skills or a beginner trying to get a feel for Python's potential.
📓 This notebook is accompanying the article https://ealizadeh.com/blog/itertools/.
References
[1] Python Software Foundation, “itertools — Functions creating iterators for efficient looping,” May 23, 2023. https://docs.python.org/3/library/itertools.html
[2] Python Software Foundation, “The Python Standard Library » Built-in Types,” May 25, 2023. https://docs.python.org/3/library/stdtypes.html#iterator-types
Posted on June 8, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.