Understanding Python Iterables: Generators
tyrael
Posted on August 31, 2024
Header by: Santiago Uribe Uribe Domínguez
Edited by: Sharmaigne Mabano
It is inarguable that one of Python's strongest suits is how it deals with iterables, or traversable objects.
When you're new to Python, something that immediately sticks out is the syntax of the for
loop.
# prints 0 to 9
for i in range(10):
print(i)
You'll find that Python strictly enforces the foreach
control structure that can be found in other languages, and if you're used to the C-like way of writing for
-loops, this might have, pun intended, thrown you into a loop.
But in understanding this fundamental concept in Python, we actually allow ourselves to write more elegant and terser code. So let's incrementally tackle this topic.
Iterators
We'll start by defining iterators as objects we can traverse through one item at a time. Think anything you can pass into a for
loop, and iterables as objects that can be turned into iterators.
Note that this traversal cannot go backwards.
Furthermore, we can actually create our own iterable
classes.
class Fibonacci:
def __iter__(self):
self.a = 0
self.b = 1
return self
def __next__(self):
temp = self.a
self.a = self.b
self.b = temp + self.a
return temp
for n in Fibonacci():
print(n)
All that we have to define are the __iter__
and __next__
methods!
__iter__
returns the object to be used for iteration (this is what turns our iterable into an iterator) whenever our object is invoked in the context of an iterator (in this case it is invoked as the iterable for a for
loop). In this example, we use it as an initialization function of sorts, but you can just think of it as the constructor for your iterator instance.
If that had too much jargon, all that you need to know is that what
__iter__
returns is the object to be used in iteration!
__next__
is the function that gets called on every iteration, its return being the value for that current loop.
Side note: dunder (double underscore) methods are just special methods for objects that you can call by passing the object in the dunder method's base name (without the underscores), or by invoking the dunder method on the object as an attribute.
An example of this is:
a = Fibonacci() # we create our iterable object (this isn't an iterator yet)
# Let's try calling the __next__ dunder method
next(a) # AttributeError: 'Fibonacci' object has no attribute 'b' (because iter hasn't been called)
b = iter(a) # We bring our object into an iterator context (this is done implicitly in for loops)
# both ways of calling the dunder method works
next(b) # 1
b.__next__() # 1
next(b) # 2
next(b) # 3
next(b) # 5
For the keen-eyed, you might have noticed that even though we're allowed to traverse this iterable using __next__
or a for
loop, you can actually keep getting the next value infinitely!
If this infinite looping behavior is not your intention, you can simply add a base case to your __next__
method, by raising a StopIteration
exception.
def __next__(self):
# stop the looping
if self.a > 100:
raise StopIteration
temp = self.a
self.a = self.b
self.b = temp + self.a
return temp
This allows your for
loops to stop iterating on your set condition. However if you try to call __next__
on an iterator that has hit StopIteration
, it will throw a runtime error.
In fact, we can see these dunder methods __iter__
and __next__
on any iterable object in Python such as lists and tuples! We can use the __dir__
dunder method to check for all of the methods found in an object.
a = [1,2] # a list, a common iterable
print('__iter__' in dir(a)) # True
print('__next__' in dir(a)) # False
That's odd, the list object has an __iter__
method, but no __next__
! Why is that?
Generators
Generators are special kinds of functions that instead of returning a single value, returns an iterator object. But instead of using the return
keyword, it implicitly returns the iterator object using yield
. Let's see this in action by re-implementing our Fibonacci iterator.
def Fibonacci():
a = 0
b = 1
while a < 100:
temp = a
a = b
b = temp + a
yield a
a = Fibonacci() # our iterator object
print(next(a)) # 1
for num in a:
print(num) # 2 .. 144
We get our iterator object on the initial call of our function a = Fibonacci()
. This is evident by our ability to call __next__
and use it as an iterable in a for
loop. However, we can clearly see that there is no explicit definition for the __next__
method, which means that it is baked into the logic of the definition itself.
Now what does this mean? Lets try to follow the execution of our iterator on the first 3 __next__
calls (or for
loop iterations):
--- first __next__ ---
a is defined
b is defined
we enter the while loop
we move forward with the fibonacci pattern
we return a
--- second __next__ ---
we enter the while loop
we move forward with the fibonacci pattern
we return a
--- third __next__ ---
we enter the while loop
we move forward with the fibonacci pattern
we return a ...
We can see here that for every __next__
call, we can think of our iterator running through our defined function up until it hits a yield
, where it uses that value to return for that specific iteration. Moreover, for the next iterations, we simply pick up from the line after the yield
where we left off.
This means that instead of having to raise
StopIteration
by ourselves, we can simply just let the function exit, greatly simplifying our conditions.Note that this also means
return
in generators become analogous toraise StopIteration
(and are only provided values in advanced cases we won't cover here)!
So really, we aren't learning anything new here in the context of iterators, but a terser syntax and a more accessible interface for creating iterators!
Let's go back to our cliffhanger from the last section, by taking a look at the __dir__
of a generator.
def a(): # a function becomes a generator in the presence of a `yield` keyword
yield 3
x = a() # we create our generator object
print('__iter__' in dir(x)) # True
print('__next__' in dir(x)) # True
Since generators do have instances of the __iter__
and __next__
method, we can confirm that it is in fact an iterator. And how is this relevant?
Recall that:
-
Any function becomes a generator when it has a
yield
inside of it. - Generators are simply functions that return an iterator object, with it's
__next__
function as its definition (kind of) -
__iter__
method is used to implicitly return the object used for iteration whenever the context calls for it- These contexts include being casted into a iterator with
iter
, being used as an iterable for afor
loop, and as we'll later learn: being unpacked.
- These contexts include being casted into a iterator with
We can deduce that, under the hood, native Python iterables define their __iter__
methods as generators, making the __next__
absent from the class definition, but not from its iterator instance.
a = [1,2]
print('__iter__' in dir(a)) # True
print('__next__' in dir(a)) # False
b = iter(a) # we bring it into the context of an iterator
print('__iter__' in dir(b)) # True
print('__next__' in dir(b)) # True
Note that this assumption is actually just an oversimplification! As lists and tuples actually return a special
list_iterator
andtuple_iterator
object respectively, demonstrating that you don't necessarily have to returnself
in the__iter__
method.
Putting a generator in place of the __iter__
method is not only an elegant way of writing it, but it also serves a purpose for encapsulating the __next__
method inside of the iterator instance itself, since if you can remember-- we weren't even able to use __next__
until our object became an iterator anyway!
Laziness
Besides the fact that knowing how a language works is pretty cool, generators (and by extension, iterators) also serve as a massive point of optimization for most Python programs. Which we can just derive from its name, it generates values.
In fact, I would go as far as to say that coding in Python becomes infeasible without generators, all because of their lazy evaluation.
Lazy evaluation is not a new concept in Computer Science, in fact it's a byproduct of functional programming, wherein we may delay the execution of a transformation on data up until the point where we actually need it.
A good analogy of this is being given ingredients for a meal that you have to cook tonight. With all of those ingredients and your cooking gear and whatnot, you can argue that you technically already have the meal, you are simply not making it until tonight. This could be for various reasons, one of which could be that it is easier to store the ingredients than the entire meal itself.
When we created the Fibonacci
generator, we could argue that we did have the Fibonacci numbers up to 144, we just simply weren't calculating them until we had to. This approach saves resources from our program as we don't end up keeping all of the memory we need up front, i.e. if we wanted to keep the Fibonacci numbers less than ten million, but we're still technically storing the numbers.
Think of our ubiquitous range
function, an example of a generator, as we know that range
doesn't keep a list in memory, but rather generates the numbers sequentially (we know this by the fact that we can only traverse the iterable in the direction indicated by the step
parameter).
In fact, you should go ahead and try to implement some preexisting generators in your free time, to see how well you understand Python! Notable generators are enumerate
, zip
, map
, filter
. As for range
, it actually requires a bit more prerequisite knowledge that you may originally expect, so we'll be covering it on a follow-up article.
Good luck!
Posted on August 31, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 29, 2024