How Python uses Garbage Collection for Efficient Memory Management
Karishma Shukla
Posted on July 7, 2023
What are variables in Python?
A variable in Python is usually assumed to be a label of a value. Instead, a variable references an object that holds a value.
In Python, variables are references.
How are objects stored in Python?
An object can be defined as a block of memory with a specific value supporting specific type of operations.
In Python, everything is an object.
A Python object is stored in memory with names (not variables) and references
- Name - Just a label for an object. An object can have multiple names.
- References - A name referring an object.
Every object consists of : reference
count, type
, value
.
How variables are stored in memory. Image by author.
References Introduction
The following example assigns a number with value 10
to num
variable
num = 10
Under the hood, Python creates a new integer object of type int
in the memory. The variable num
references to that memory address
To find the memory address of an object referenced by a variable we can use the built-in id()
function.
The id()
function returns memory address as a base-10 number. We will convert it into hexadecimal using in-built hex()
function.
print(hex(id(num)))
--> 0x7ffdb446d448
Hex representation of a reference’s memory address. Image by author.
Passing arguments in Python functions
In Python unlike other languages, there is no such thing as pass by value or pass by reference.
Instead, Python has the concept of pass by assignment or pass by object reference.
When a function is called with an argument, a new reference to the object is created and assigned to the parameter variable in the function. The parameter variable becomes a new reference to the same object in memory, not a copy of the object itself. Any modifications made to the object within the function will affect the original object outside the function.
The value of the reference (the memory address) is passed to the function, not the value of the object itself.
Example : The parameter is immutable
Immutable objects include built-in data types like int
, float
, complex
, bool
, strings
, bytes
and tuples
.
def f(name):
name = 'John'
new_name = 'Mary'
f(new_name)
print(new_name)
Output:
Mary
In the above example, both name
and new_name
point to Mary
at the same time. But when name = ‘John‘
, a new object is recreated with the value of John and name
continues pointing to it, while new_name
still points to Mary
. Hence the value of new_name
does not change.
Example : The parameter is mutable
Mutable objects include list
, dict
and set
.
def f(students):
students.append(3)
students = [0,1,2]
f(students)
print(students)
Output:
[0,1,2,3]
In the example above, as students
is a list, changing the value of students
will also change value of all variables that point to it. Hence students
becomes [0,1,2,3]
Garbage Collection
Garbage collection in Python refers to the automatic process of reclaiming memory occupied by objects that are no longer in use. It is a mechanism that manages the allocation and deallocation of memory in Python.
Python uses a garbage collector to automatically detect and remove objects that are no longer referenced or reachable by the program. When an object is no longer needed, the garbage collector identifies it as garbage and frees up the memory occupied by that object.
The two strategies used for garbage collection are
- reference counting
- generational garbage collection
1. Reference Counting
It keeps track of the number of references to each object, and when the count reaches zero, indicating that no references to the object exist, the object is considered garbage and the memory is reclaimed.
To get the reference count of an object, we can use the built in ctypes
module.
import ctypes
def count_references(address):
"""
Count the number of references to the object at the given address.
"""
return ctypes.c_long.from_address(address).value
students = 15
print(count_references(id(students)))
# Step 1
toppers = students
print(count_references(id(students)))
# Step 2
toppers = 2
print(count_references(id(students)))
# Step 3
students = 1
print(count_references(id(students)))
Step 1: reference count of students = 2. Image by author.
Step 2: reference count of students = 1. Image by author.
Step 3: The number of references of the integer object with value of 15 will be 0. Image by author.
But reference counting cannot solve the problem of cyclical reference.
What is cyclical reference?
A cyclical reference, also known as a reference cycle or circular reference, occurs in Python when a group of objects reference each other in a way that forms a closed loop, preventing them from being garbage collected. This can lead to memory leaks as the objects involved are not eligible for automatic memory reclamation since their reference counts never reach zero.
Basic example of cyclical reference:
x = []
x.append(x)
print(x)
In the above example x is referring to itself, which makes it a cyclical reference.
To solve this problem Python uses Generational Garbage Collection.
2. Generational Garbage Collection
Generational Garbage Collection uses a trace-based garbage collection technique.
Trace-based garbage collection is a technique used in some garbage collection algorithms to identify and collect unreachable objects. It works by tracing the execution of a program and identifying live objects based on their accessibility from root references.
Generational Garbage Collection divides objects into different generations based on their age, with the assumption that most objects become garbage relatively quickly after they are created.
The main idea behind Generational Garbage Collection is that younger objects are more likely to become garbage than older objects. Python's garbage collector focuses its efforts on the younger generations, performing frequent garbage collection on them. Older generations are garbage collected less frequently since they are expected to contain objects that have survived multiple collections and are less likely to become garbage.
Generational Garbage Collection helps address the problem of cyclical references by periodically examining objects in different generations and collecting those that are no longer reachable. It detects and breaks cyclical references by identifying unreachable objects through a process known as "mark and sweep."
Generational Garbage Collection thus ensures:
- no memory leaks
- proper utilization of system resources
- efficient garbage collection
Programmatically interact with Python’s garbage collector
In the example below, we create two classes Students and Boys referencing each other and perform garbage collection using in-built gc
module (Garbage Collector interface).
You should never disable the garbage collector unless required.
import gc
import ctypes
def count_references(address):
"""
Count the number of references to the object at the given address.
"""
return ctypes.c_long.from_address(address).value
def object_exists(obj_id):
"""
Return True if the object with the given id exists.
"""
for obj in gc.get_objects():
if id(obj) == obj_id:
return True
return False
class Students:
def __init__(self):
self.boys = Boys(self)
print(f'Students: {hex(id(self))}, Boys: {hex(id(self.boys))}')
class Boys:
def __init__(self, students):
self.students = students
print(f'Boys: {hex(id(self))}, Students: {hex(id(self.students))}')
gc.disable()
students = Students()
students_id = id(students)
boys_id = id(students.boys)
print(f'Number of references to students: {count_references(students_id)}') # 2
print(f'Number of references to boys: {count_references(boys_id)}') # 1
print(f'Does students exist? {object_exists(students_id)}') # True
print(f'Does boys exist? {object_exists(boys_id)}') # True
students = None
print(f'Number of references to students: {count_references(students_id)}') # 1
print(f'Number of references to boys: {count_references(boys_id)}') # 1
print(f'Does students exist? {object_exists(students_id)}') # True
print(f'Does boys exist? {object_exists(boys_id)}') # True
print('Collecting garbage...')
gc.collect()
print(f'Does students exist? {object_exists(students_id)}') # False
print(f'Does boys exist? {object_exists(boys_id)}') # False
print(f'Number of references to students: {count_references(students_id)}') # 0
print(f'Number of references to boys: {count_references(boys_id)}') # 0
Output:
Boys: 0x1e18b68c6d0, Students: 0x1e18b698510
Students: 0x1e18b698510, Boys: 0x1e18b68c6d0
Number of references to students: 2
Number of references to boys: 1
Does students exist? True
Does boys exist? True
Number of references to students: 1
Number of references to boys: 1
Does students exist? True
Does boys exist? True
Collecting garbage...
Does students exist? False
Does boys exist? False
Number of references to students: 0
Number of references to boys: 0
Conclusion
Garbage collection in Python helps manage memory efficiently, automatically freeing up resources and preventing memory leaks, so developers can focus on writing code without explicitly managing memory deallocation.
Posted on July 7, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.