Demystifying Python’s Descriptor Protocol

A lot of modern frameworks and libraries use the "descriptor" protocol to make the process of creating APIs for end-users neat and simple. Let's discuss how the behavior of Python's builtins like property, staticmethod and classmethod can be imitated using the descriptor protocol.

Consider the following example class:

class Person:
    def __init__(self, first_name, last_name):
        self.first_name = first_name
        self.last_name = last_name

    def _full_name_getter(self):
        return f'{self.first_name} {self.last_name}'.title()

    def _full_name_setter(self, value):
        first_name, *_, last_name = value.split()
        self.first_name = first_name
        self.last_name = last_name

    full_name = property(fget=_full_name_getter, fset=_full_name_setter)


foo = Person('foo', 'bar')

Whenever we access any of foo's attribute, say foo.first_name, then first_name is checked in the following until it is found:

foo.__dict__,
type(foo).__dict__
__dict__ of foo's base classes in the MRO¹ — except for metaclasses.

>>> foo.__dict__
{'first_name': 'foo', 'last_name': 'bar'}
>>> foo.first_name
foo
>>> foo.last_name
bar
>>> foo.full_name
Foo Bar
>>>

Notice that attribute full_name isn't there in foo.__dict__. Huh, where did it come from?

Well, the attribute access mechanism that we discussed was incomplete. But, before going into that, let's take a detour and look at how the descriptor protocol — on the base of which the property, classmethod, staticmethod work — works.

What's the descriptor protocol?

Any object which has at least one of __get__, __set__, __delete__ methods defined, is called a descriptor. The signature of these methods are:


__get__(self, obj, type=None) -> value

__set__(self, obj, value) -> None

__delete__(self, obj) -> None

There are two types of descriptors: data descriptors, and non-data descriptors. The difference between two is that, if an object has either of __set__ or __delete__ defined then it's called as a data descriptor. A non-data descriptor, therefore, only has __get__ defined among these three methods. Data and non-data descriptors have different precedence in the attribute lookup chain (more on it later).

In Person class, the class attribute full_name is a descriptor. When foo.full_name is accessed, the Person.full_name.__get__(foo, Person) gets called, which in turn calls the function that we passed in property as fget keyword argument.

So the attribute access mechanism now is:

Check if type(foo).__dict__['first_name'] is a data descriptor. If yes, then Person.first_name.__get__(foo, Person) is returned.
If not, first_name is checked in foo.__dict__, type(foo).__dict__ and in __dict__ of foo's base classes in MRO¹ — unless it's a metaclass.
Lastly, it is checked if type(foo).__dict__['first_name'] is a non-data descriptor, in which case Person.first_name.__get__(foo, Person) is returned.

Note that the first and third steps are almost similar. But, if an attribute is a data descriptor, then it's given the highest precedence, and in case of non-data descriptor the __dict__ lookup has higher precedence than non-data descriptors. We'll see how this will be used in the cached property later in the post.

You might be wondering what orchestrates this lookup mechanism. Well it's __getattribute__ (not to be confused with __getattr__) — When we lookup foo.full_name, foo.__getattribute__('full_name') is called, which handles it according to the attribute access mechanism we just defined.

It is also important to understand attribute setting mechanism. Consider this statement: foo.age = 32 :

if age attribute is a descriptor then type(foo).__dict__['age'].__set__(32) is called. In case age is a non-data descriptor, AttributeError is thrown.
Otherwise, an entry is created in foo's __dict__, i.e foo.__dict__['age'] = 32.

How does property builtin works?

Let's first see the signature of property.

property(fget=None, fset=None, fdel=None, doc=None)

although it looks like a function, but it's actually a class which is also a descriptor because it has __get__, __set__, and __delete__ defined.

We know that an attribute which is a descriptor, when accessed on an object say foo, calls its __get__ method with the object and class of the object as arguments, i.e. type(foo).__dict__['attr_name'].__get__(foo, type(foo)). Similarly, when it's being set, then its __set__ method is called with the object and value to be set, i.e. type(foo).__dict__['attr_name'].__set__(foo, value).

Continuing with the opening example:

>>> foo.full_name 
Foo Bar
>>> # Person.__dict__['full_name'].__get__(foo, Person)
>>> foo.full_name = 'keanu reeves'
>>> # Person.__dict__['full_name'].__set__(foo, 'keanu reeves')
>>> foo.first_name
keanu
>>> foo.last_name
reeves
>>> foo.full_name
Keanu Reeves

Note that when we set foo.full_name = 'keanu reeves' , then full_name property's __set__ is called which in turn calls the _full_name_setter that we passed to property as fset argument.

We can mimic the property behavior with the following implementation:


class Property:
    def __init__(self, fget=None, fset=None, fdel=None, doc=None):
        self.fget = fget
        self.fset = fset
        self.fdel = fdel
        self.doc = doc

    def __get__(self, instance, owner):
        if self.fget is None:
            raise AttributeError("unreadable attribute")
        return self.fget(instance)

    def __set__(self, obj, value):
        if self.fset is None:
            raise AttributeError("can't set attribute")
        self.fset(obj, value)

    def __delete__(self, obj):
        if self.fdel is None:
            raise AttributeError("can't delete attribute")
        self.fdel(obj)

    def getter(self, fget):
        return type(self)(fget, self.fset, self.fdel, self.__doc__)

    def setter(self, fset):
        return type(self)(self.fget, fset, self.fdel, self.__doc__)

    def deleter(self, fdel):
        return type(self)(self.fget, self.fset, fdel, self.__doc__)

How does cached property works?

The expected behavior for a cached property is that it should be calculated if it hasn't been calculated already, and after the calculation, it should be stored ('cached') so that it can be quickly be accessed next time onwards.

class CachedProperty: 

    def __init__(self, function): 
        self.function = function 

    def __get__(self, instance, owner): 
        result = self.function(instance)
        instance.__dict__[self.function.__name__] = result
        return result

Let's now use it


>>> class Foo:
>>>    def score(self):
>>>        print('doing some time-consuming calculations')
>>>        return 19.5
>>>
>>>    score = CachedProperty(score)
>>>    # you can also use CachedProperty as decorator
>>>
>>> foo = Foo()
>>> vars(foo)   # i.e foo.__dict__
>>> {}

>>> foo.score
doing some time-consuming calculations
19.5

>>> vars(foo)
{'score': 19.5}

>>> foo.score
19.5

Observe that when we first accessed the score attribute on foo, it printed "doing some time-consuming calculations". After foo.score was accessed once, foo.__dict__ was populated with a new entry with the key score. If we access foo.score for a second time now, nothing would be printed — it returns vars(foo)['score'] instead.

Why did that happen?

To answer this, it's time to recall the attribute access machinery. When score was accessed for first time:

It was checked if score was a data descriptor. It was not.
The next check was done on __dict__. Again score key wasn't found in either foo or in it's base's __dict__.
Next, it was checked if score was a non-data descriptor — True, therefore type(foo).__dict__['score'].__get__(foo, type(foo)) was called which stored and returned the result.

When score is now accessed second time onward:

Check if score is a data descriptor — It's not.
'score' key is then looked up in foo.__dict__, where it was inserted when score was accessed for the first time. foo.__dict__['score'] is returned.

One example where using cached property becomes particularly useful is if you've a model class in Django and you've defined a property which makes a time consuming query. Django's "batteries included" philosophy falls no short, and provides django.utils.functional.cached_property for this use case.

How do staticmethod and classmethod work?

A method decorated by staticmethod does not receive an implicit first argument. It converts a function to be a static method. Let's implement it using descriptor protocol:

class StaticMethod:
    def __init__(self, function):
        self.function = function

    def __get__(self, instance, owner):
        return self.function

Similarly, the descriptive API can be used to implement classmethod decorated methods — which receive the class object as the first argument — as follows:

class ClassMethod:
    def __init__(self, function):
        self.function = function

    def __get__(self, instance, owner):
        def wrapper(*args, **kwargs):
            return self.function(owner or type(instance), *args, **kwargs)
        return wrapper

We've used descriptor magic to understand how builtins like staticmethod, classmethod and property work, and how we can implement one like CachedProperty ourselves. Note that the CachedProperty that we implemented is not a hack — Python 3 provides these APIs to enable developers to be able to customize things as and when needed.

Helpful links:

Method Resolution Order ↩

Blog

Demystifying Python’s Descriptor Protocol

Karan Suthar

What's the descriptor protocol?

How does property builtin works?

How does cached property works?

How do staticmethod and classmethod work?

Join Our Newsletter. No Spam, Only the good stuff.

Related