Functional programming tips for Python
Kimmo Sääskilahti
Posted on November 29, 2020
Hi! In this post, I'd like to present my favourite functional programming techniques (FP) for Python. I'm a big fan of FP as I've found that by following FP principles I can write code that is more readable and easier to debug. Python is not a functional programming language (and it never will be), but I think there are still many things we can learn from languages such as Haskell that are beneficial also in Python.
In short, here are the tips:
- Use list and dictionary comprehensions
- Use generator expressions and
itertools
- Use frozen dataclasses
- Keep your lists and dictionaries immutable
- Use type union instead of inheritance
Use list and dictionary comprehensions
The first technique is to use list comprehensions to create lists and to perform operations such as map
and filter
on lists.
In Haskell, we would use list comprehensions as follows:
ghci> [x*2 | x <- [1..10], x*2 >= 12]
[12,14,16,18,20]
Here we're traversing the list [1..10]
by filtering values by x*2 >= 12
and mapping with x*2
.
In Python, we would achieve the same with
>>> [2*x for x in range(1, 11) if 2*x >= 12]
[12, 14, 16, 18, 20]
The more "imperative" version of the above would be:
a = []
for i in range(1, 11):
val = 2*x
if val >= 12:
a.append(val)
print(a)
One can also perform flatmap
-like operations with nested list comprehensions:
>>> xss = [[1, 2], [3, 4], [5, 6]]
>>> [2*x for xs in xss for x in xs]
[2, 4, 6, 8, 10, 12]
One problem I often face with list comprehensions in Python is that there's no let
construct. This construct would be useful for improving code readability and to avoid re-computing intermediate values by binding them to variables. For example, in Haskell you could do:
ghci> [y | x <- [1..50], let y=x*x, y `mod` 5 == 0]
[25,100,225,400,625,900,1225,1600,2025,2500]
Here we avoid computing x*x
twice for every element by binding the value to variable y
.
In Python, we could write
>>> [x*x for x in range(1, 51) if x*x % 5 == 0]
[25, 100, 225, 400, 625, 900, 1225, 1600, 2025, 2500]
but this would compute x*x
twice for every element. We can circumvent this by using an auxiliary one-element tuple as follows:
>>> [y for x in range(1, 51) for y in (x*x,) if y % 5 == 0]
[25, 100, 225, 400, 625, 900, 1225, 1600, 2025, 2500]
It's not as readable as in Haskell, but sometimes it feels like the best choice. We can make the construct more readable by splitting it into multiple lines:
ys = [x_squared for x in range(1, 51)
for x_squared in (x*x,) # let x_squared = x*x
if x_squared % 5 == 0]
Use generator expressions and itertools
In Haskell, expressions are evaluted lazily so it's completely natural to work with e.g. lists containing all integers. Such infinite lists are never supposed to be evaluated in full, but the values are created only when they are actually needed.
Python has excellent support for such "iterators" via generator expressions and functions. The itertools
package contains many useful functions for working with iterators.
As an example, let's consider the first problem in Project Euler. Sharing solutions of Project Euler problems publicly is strongly discouraged, but since the internet already is full of answers for the first problem, I'll make an exception here. Here's the problem:
If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23. Find the sum of all the multiples of 3 or 5 below 1000.
We could solve this with iterators as follows:
>>> from itertools import count, takewhile
>>> ints = count(1) # Iterator of all integers
>>> multiples = (val for val in ints if val % 3 == 0 or val % 5 == 0) # All integers that are multiples of 3 or five
>>> multiples_below_1000 = takewhile(lambda val: val < 1000, multiples) # All such integers below 1000
>>> sum(multiples_below_1000) # Sum of all such values, this is the evaluation step
233168
This isn't the most efficient solution to the problem in Python, but I find it very easy to read and reason about.
Use frozen dataclasses
In pure functional languages such as Haskell, one cannot mutate objects in-place. Instead, one must compose a program out of pure functions that do not mutate their inputs. Code becomes essentially a pipeline, where immutable data structures flow from one transformation to the next. This kind of thinking is also encouraged in the wonderful Pragmatic Programmer book:
"Thinking of code as a series of (nested) transformations can be a liberating approach to programming. It takes a while to get used to, but once you've developed the habit you'll find your code becomes cleaner, your functions shorter, and your designs flatter. Give it a try."
Now, it does not make much sense to try to write purely functional programs with Python, but I think the idea of favouring "transformative" code with immutable objects is a good idea. Using immutable objects also eliminates one whole class of potential bugs.
For plain data structures, Python's built-in dataclasses are a good choice for immutable containers. With frozen=True
, we can ensure the properties in our dataclasses cannot be mutated after their creation:
from dataclasses import dataclass
@dataclass(frozen=True)
class Doggie:
name: str
age: int
If we want to change a Doggie
's name, we can use replace
:
from dataclasses import replace
def change_doggie_name(doggie: Doggie, new_name: str) -> Doggie:
return replace(doggie, name=new_name)
Creating a new object every time will incur a performance penalty, so it's up to you to decide if the overhead is too large for your program.
It's important to remember that frozen dataclasses aren't truly mutable, as we can still mutate all mutable data structures within our class:
@dataclass(frozen=True)
class Doggie:
name: str
age: int
attributes: dict
doggie = Doggie(name="Ben", age=10, attributes={})
doggie.attributes["colour"] = "brown" # In-place mutation
We can avoid this by using frozen dataclasses as properties:
@dataclass(frozen=True)
class DoggieAttributes:
colour: str
@dataclass(frozen=True)
class Doggie:
name: str
age: int
attributes: DoggieAttributes
doggie.attributes.colour = "black" # NOT ALLOWED
Changing a doggie's name can then be achieved by chaining replace
calls:
black_doggie = replace(doggie, attributes=replace(doggie.attributes, colour="black"))
With deeply nested structures, using replace
can become tedious. In functional programming languages, this problem of "mutating" deeply nested structures is elegantly handled by optics libraries such as lens
in Haskell. Python has its own lenses
package, but I haven't found it to be that useful out there in the real world: without the power of static type checking, overuse of lenses can result in very cryptic code.
With that said, most data structures I've encountered are at most two to three levels deep and using replace
isn't an issue, especially when using helper functions such as
def change_doggie_colour(doggie: Doggie, new_colour: str):
return replace(doggie,
attributes=replace(doggie.attributes,
colour=new_colour))
Keep your lists and dictionaries immutable
Continuing on the topic of immutability, we still need to work with mutable lists and dictionaries in Python code. Just because they are mutable, that doesn't mean we need to mutate them. Whenever I find myself trying to mutate a list in-place with l.append(value)
or a dictionary with d["key"] = value
, I stop and think if it's really necessary. Such an operation would not be available in pure functional languages such as Haskell, so it's definitely possible to write code without mutating objects in-place. Could I avoid it here?
If I'm creating a new list or dictionary, maybe I could use list or dictionary comprehensions instead. If I need to add a value to an existing list, it may be better to simply create a whole new new object with the help of positional expansion:
new_list = [*old_list, value]
Similarly, instead of adding a key to an existing dictionary, maybe I can just create a new dictionary with keyword expansion:
new_dict = { **old_dict, "key": value }
One advantage in treating our lists as read-only is that if we use Python's typing
system (we should!) for static type checking, we can use the read-only typing.Sequence
type for our lists instead of the mutable typing.List
type. Because typing.Sequence[T]
is a read-only collection for values of type T
, it's covariant in T
:
typing.Sequence[A] <: typing.Sequence[B] if A <: B
This means that we can use typing.Sequence[A]
where-ever typing.Sequence[B]
is expected, as long as A
is a subtype of B
. For similar reasons, it's better to use the read-only typing.Mapping
for dictionaries instead of typing.Dict
.
It's up to you to decide if creating new objects instead of mutating existing ones is the right solution for your program. In performance-critical cases, it's most likely better to simply mutate lists in-place instead of suffering the overhead of creating new objects. Similarly, if creating the list includes side-effects such as writing to a database, it's better to avoid list comprehensions for readability's sake.
Use type union instead of inheritance
The Pragmatic Programmer book has an interesting paragraph on inheritance:
"Do you program in an object-oriented language? Do you use inheritance? If so, stop! It probably isn't what you want to do."
How to avoid inheritance in Python? Let's assume you'd like to work with objects of type Doggie
and Cat
. Both Doggie
and Cat
have a say()
method and you want to put such animals in containers such as (immutable) lists. The classic Programming 101 solution would be to create a super-class Animal
and inherit Doggie
and Cat
from Animal
:
import abc
class Animal:
def say(self) -> str:
raise NotImplementedError()
class Doggie(Animal):
def say(self) -> str:
return "Woof!"
class Cat(Animal):
def say(self) -> str:
return "Meow!"
Now we get classic polymorphism:
import typing
def all_animals_say(animals: typing.Sequence[Animal]):
for animal in animals:
animal.say()
However, we can achieve the same without any of the problems of inheritance by using a type union:
BetterAnimal = typing.Union[Doggie, Cat]
def all_better_animals_say(animals: typing.Sequence[BetterAnimal]):
for animal in animals:
print(animal.say())
If any class within the BetterAnimal
doesn't have the say()
method of proper type, the type-checker will complain.
Type unions only cover one class of use cases for inheritance, so don't expect you can always get rid of inheritance via type unions. The above example also wasn't a good example of "bad inheritance", as we could have made Animal
an "interface" by turning it into an abstract class with abstract methods and without any hard-coded behaviour. The point I'm trying to make is: always stop to think before using inheritance.
Conclusion
This concludes my list of functional programming tips for Python. If you have any other tips, comments or questions, please leave a comment! Thanks for reading!
Posted on November 29, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.