Pointers? In My Python? It's More Likely Than You Think - Part 3: Object Lifetimes and Garbage Collection

eliholderness

Eli Holderness

Posted on August 24, 2022

Pointers? In My Python? It's More Likely Than You Think - Part 3: Object Lifetimes and Garbage Collection

This is the third of a three-part series which covers various aspects of Python's memory management. It started life as a conference talk I gave in 2021, titled 'Pointers? In My Python?' and the most recent recording of it can be found here.

Check out Part 1 and Part 2 of the series - or read on for an discussion of object lifetimes, reference counting, and garbage collection in CPython!

How CPython can tell when you're done with an object, and what happens next

We ended Part 2 by asking the questions: once we've created an object x, how and why does its 'lifetime' end? In this article, we'll learn the answers by exploring how CPython frees objects from memory. CPython isn't the only implementation of Python - for example, there's Skulpt, which Anvil uses to run Python in the browser - but it's the one we'll focus on specifically for this article.

We ended Part 2 with an exploration of the weird and wonderful things that can happen when you override the __eq__ magic method in Python. Now, in Part 3, we're going to look at doing the same thing with a different magic method: __del__.

The __del__ magic method

The __del__ magic method, also called the finaliser of an object, is a method that is called right before an object is about to be removed from memory. It doesn't actually do the work of removing the object from memory - we'll see how that happens later. Instead, this method is meant to be used to do any clean-up work that needs to happen before an object is removed - for example, closing any files that were opened by the object when it was created.

We're going to be using the following class as an example throughout this section:

class MyNamedClass:
  def __init__(self, name):
    self.name = name

  def __del__(self):
    print(f"Deleting {self.name}!")
Enter fullscreen mode Exit fullscreen mode

This is just a class that'll let us know when one of its instances is about to be removed from memory - or, more specifically, when Python expects to immediately remove the class instance from memory (this won't always be true, as we'll see!).

In the above example, we've defined our class to take a name input when initialised, and when the finaliser is called, it'll let us know by printing the name of the instance in question. That way, we can get a bit of insight into which of these objects are being removed from memory, and when.

So, when will CPython decide to remove an object from memory? There are (as of CPython 3.10) two ways this happens: Reference Counting and Garbage Collection.

Reference counting in CPython

If we have a pointer to an object in Python, that's a reference to that object. For a given object a, CPython keeps track of how many other things point at a. If that counter reaches zero, it's safe to remove that object from memory, since nothing else is using it. Let's see an example:

>>> jane = MyNamedClass("Jane")
>>> del jane
Deleting Jane!
Enter fullscreen mode Exit fullscreen mode

Here we create a new object (MyNamedClass("Jane")) and create a pointer that points at it (jane =). Then, when we del jane, we remove that reference, and the MyNamedClass instance now has a reference count of 0. So, CPython decides to remove it from memory - and, right before that happens, its __del__ method is called, which prints out the message we see above.

If we create multiple references to an object, we'll have to get rid of all of them in order for the object to be removed:

>>> bob = MyNamedClass("Bob")
>>> bob_two = bob # creating a new pointer to the same object
>>> del bob # this doesn't cause the object to be removed...
>>> del bob_two # ... but this does
Deleting Bob!
Enter fullscreen mode Exit fullscreen mode

Of course, our instances of MyNamedClass could themselves contain pointers - after all, they're arbitrary Python objects, and we can add whatever attributes we like to them. Let's see an example:

>>> jane = MyNamedClass("Jane")
>>> bob = MyNamedClass("Bob")
>>> jane.friend = bob # now the "Jane" object contains a pointer to the "Bob" object...
>>> bob.friend = jane # ... and vice versa
Enter fullscreen mode Exit fullscreen mode

What we've done in the above code snippet is set up some cyclic references. The object whose name is Jane contains a pointer to the one whose name is Bob, and vice versa. Where this gets interesting is when we do the following:

>>> del jane
>>> del bob
Enter fullscreen mode Exit fullscreen mode

We've now remove the pointers that go from the namespace to the objects. Now, we can't access those MyNamedClass objects at all - but we didn't get the print message telling us they're about to be deleted. This is because there are still references to these objects, contained within each other, and therefore their reference counts are not 0.

What we've created here is a cyclic isolate; a structure where each object has at least one reference within the cycle, keeping it alive, but none of the objects in the cycle can be accessed from the namespace.

Below is a visual representation of what's going on when we create a cyclic isolate.

To begin, we create our two objects, each of which also has a name in the namespace.

A diagram showing the two Python objects 'bob' and 'jane', each with a pointer from the namespace, and each pointing to a string object

Next, we connect our two objects by adding a pointer from each to the other.

'jane' and 'bob' now also contain a 'friend' attribute, which point to the other object

Finally, we remove the pointers from the namespace by removing both of the original names for our objects. At this point, the two objects are inaccessible from the namespace, but each contains a pointer to the other so their reference counts are not zero.

The namespace pointers for 'jane' and 'bob' have been removed.

So, clearly, reference counting on its own isn't sufficient for keeping the working memory of your runtime free of useless, irretrievable objects. This is where CPython's Garbage Collector comes in!

Collecting garbage in CPython

CPython's Garbage Collector (or GC for short) is Python's built-in way to get around the problem of cyclic isolates that we just encountered. By default, it's always running in the background, and it'll work its magic every now and then so you don't have to worry about cyclic isolates clogging up your memory.

The garbage collector is designed to find and remove cyclic isolates from CPython's working memory. It does this in the following way:

  1. It detects cyclic isolates
  2. It calls the finalisers (the __del__ methods) on each object in the cyclic isolate
  3. It removes the pointers from each object (thus breaking the cycle) - only if the cycle is still isolated after step 2 (more on this later!)

After this process is complete, every object that was previously in the cycle will now have a reference count of 0, and therefore will be removed from memory.

Although it works automatically, we can actually import it as a module from the standard library. Let's do that, so we can take an explicit look at how it works!

>>> import gc
Enter fullscreen mode Exit fullscreen mode

Detecting cyclic isolates

CPython's garbage collector keeps track of various objects that exist in memory - but not all of them. We can instantiate some objects and see whether the garbage collector cares about them:

>>> gc.is_tracked("a string")
False

>>> gc.is_tracked(["a", "list"])
True
Enter fullscreen mode Exit fullscreen mode

If an object can contain pointers, that gives it the ability to form part of a cyclic isolate structure - and that's what the garbage detector exists to detect and dismantle. Such objects in Python are often called 'container objects'.

So, the garbage collector needs to know about any object that has the potential to exist as part of a cyclic isolate. Strings can't, so "a string" isn't tracked by the garbage collector. Lists (as we've seen) are able to contain pointers, and therefore ['a', 'list'] is tracked.

Any instance of a user-defined class will also be tracked by the garbage collector, as we can always set arbitrary attributes (pointers) on them.

>>> jane = MyNamedClass("Jane")
>>> gc.is_tracked(jane)
True
Enter fullscreen mode Exit fullscreen mode

So, the garbage collector knows about all the objects that could potentially form a cyclic isolate. How does it know if one has formed? Well, it also knows about all the pointers in each of those objects, and where they point. We can see this in action:

>>> my_list = [a, list]
>>> gc.get_referents(my_list)
[list, a]
Enter fullscreen mode Exit fullscreen mode

The get_referents method (also called a traversal method) takes an object, and returns a list of the objects it contains pointers to (its referents). So, the list above contains pointers to each of its elements, which are both strings.

Let's take a look at the get_referents method in the context of a cycle of objects (not yet a cyclic isolate, though, since these objects can still be accessed from the namespace):

>>> jane = MyNamedClass("Jane")
>>> bob = MyNamedClass("Bob")
>>> jane.friend = bob
>>> bob.friend = jane
>>> gc.get_referents(bob)
[{'name': 'bob', 'friend': <__main__.MyNamedClass object at 0x7ff29a095d60>}, <class '__main__.MyNamedClass'>]
Enter fullscreen mode Exit fullscreen mode

In this cycle, we can see that the object pointed to by bob contains pointers to the following: a dictionary of its attributes, containing bob's name (bob) and its friend (the MyNamedClass instance also pointed at by jane). The bob object also has a pointer to the class object itself, since bob.__class__ will return that class object.

When the garbage collector runs, it checks whether every object it knows about (that is, anything that returns True when you call gc.is_tracked on it) is reachable from the namespace. It does this by following all the pointers from the namespace, and pointers within the objects that those point to, as so on, until it builds up an entire view of everything that's accessible from code.

If, after doing this, the GC finds that there exist objects which aren't reachable from the namespace, then it can clear those objects up.

Remember, any objects that are still in memory must have a non-zero reference count, or else they'd have been removed due to reference counting. For objects to be unreachable and yet still have a non-zero reference count, they have to be part of a cyclic isolate, which is why we care so much about the possibility of these occurring.

Let's return to our cycle of friends, jane and bob, and turn that cycle into a cyclic isolate by removing the pointers from the namespace:

>>> del jane
>>> del bob
Enter fullscreen mode Exit fullscreen mode

Now, we've got ourselves into the exact situation that the garbage collector exists to fix. We can trigger manual garbage collection by calling gc.collect():

>>> gc.collect()
Deleting Bob!
Deleting Jane!
4
Enter fullscreen mode Exit fullscreen mode

By default, the garbage collector will perform this action automatically every so often (as more and more objects are created and destroyed within the CPython runtime).

The output that we see in the code snippet above contains the print statements from our MyNamedClass's __del__ method, and at the end there's a number - in this case, 4. This number is output from the garbage collector itself, and it tells us how many objects were removed.

You might think that only 2 objects (our two MyNamedClass instances) were removed, but each of them also pointed to a string object (their name). Once those two MyNamedClass instances are removed, the reference count for each of those name strings also falls to zero, so they're removed too, bringing the total to 4 objects.

Finalisers behaving badly

Earlier, we mentioned that the garbage collector works in a 3-step process: detecting cyclic isolates, calling the finalisers on each object in the cycle, then breaking the cycle by removing the pointers between the objects... if the cycle still remains isolated at this point. Now, the only way that the cycle could go from being isolated to not-isolated between the first and third step is if the finalisers do something to make that happen.

Let's define a class that does just that:

class MyBadClass:
  def __init__(self, name):
    self.name = name

  def __del__(self):
    global person # create an externally accessible pointer...
    person = self # ... and point it at the object about to be removed
    print(fdeleting {self.name}!”)
Enter fullscreen mode Exit fullscreen mode

In this class's finaliser, a global variable is created. That means that even if an instance of MyBadClass becomes inaccessible from the namespace (as part of a cyclic isolate, for example), it can still 'reach out' into the namespace, create a pointer there, and point that pointer at itself - thus de-isolating itself.

>>> jane = MyBadClass("Jane")
>>> bob = MyBadClass("Bob")
>>> jane.friend = bob
>>> bob.friend = jane
>>> del jane
>>> del bob
Enter fullscreen mode Exit fullscreen mode

To see this in action, we set up the cyclic isolate structure, as we've done before with other (more well-behaved) classes. Then, we trigger garbage collection:

>>> gc.collect()
Deleting Bob!
Deleting Jane!
0
Enter fullscreen mode Exit fullscreen mode

We see the print statements from each instance's __del__ method, but after that, the garbage collector prints us out a 0. That means that no objects were removed from memory - and that's because, after the garbage collector caused the finalisers to be called, it checked to make sure that the cycle was still isolated.

If the cycle were still isolated, then the garbage collector could safely remove all the pointers linking up the objects, reducing their reference counter to 0. But, in this case, the cycle was no longer isolated, and so the garbage collector doesn't break the links between the objects in it.

So, if we got rid of the jane and bob pointers, how can the cycle still be accessed from the namespace? The answer is that global person variable that was created in the finaliser. Let's take a look at it:

>>> person
<__main__.MyNamedClass object at 0x7ff29a095d60>

>>> person.name
'Jane'

>>> person.friend.name
'Bob'
Enter fullscreen mode Exit fullscreen mode

We can see that the object pointed to by person is the same one that had previously been pointed to by jane. This make sense if you look at the above output from calling gc.collect(); the print statement that appeared last was the one for the Jane object, and therefore that was the object that set person = self most recently.

In other words, the two objects have had their original pointers jane and bob removed - but when their finalisers are called, a new external pointer from the namespace is created, meaning that the cycle is no longer isolated and shouldn't be removed by the GC.

Doing this sort of thing can create strange results, because it means you can access objects whose finalisers have already run -- and that probably means they've cleaned themselves up in a way that means you shouldn't be interacting with them again. For example, their finalisers may have closed a file that other methods on the object will assume to still be open. once again: overriding magic methods is serious business!

So, does MyBadClass break garbage collection entirely? The answer is no, and that's because of a very important property of finalisers: they can only be called once per object. After bob's __del__ method has been called once (when it was triggered by the call to gc.collect()), it's done, and can never be executed again. That means we can do the following:

>>> del person
>>> gc.collect()
4
Enter fullscreen mode Exit fullscreen mode

We don't see the "Deleting Jane" and "Deleting Bob!" messages, because those are printed by the objects' finalisers - and those have already been called once, and can't be called again. But, since the person pointer has been removed, the cycle is isolated again; and, because that person pointer won't be recreated by a finaliser, the garbage collector can safely go ahead and remove the pointers linking our two MyBadClass instances.

Then the garbage collector continues its work, printing out a 4 to let us know that those two objects and their name attributes have been removed from memory - and all is well in the world (of our CPython interpreter, at least!) again!

So what have we learned?

Let's recap! These three articles have been a whistle-stop tour of how Python handles objects in memory. We've looked at how pointers work, why pointer aliasing happens, and whether you'll need to use =, copy or deepcopy. We've also seen what object IDs are that the is comparator uses them, and how we can override the __eq__ magic method to define our own equality conditions to make == do whatever we want. Finally, we've covered object lifetimes, the __del__ magic method, and how CPython frees objects from memory when they're not needed any more using reference counting and garbage collection.

Now that you know all of these things, you can go forth and write better Python code!

See this article as a talk

Various recorded versions of this talk are available at the following links:

More about Anvil

If you're new here, welcome! Anvil is a platform for building full-stack web apps with nothing but Python. No need to wrestle with JS, HTML, CSS, Python, SQL and all their frameworks – just build it all in Python.

Try Anvil - it's free, forever.

💖 💪 🙅 🚩
eliholderness
Eli Holderness

Posted on August 24, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related