Pythonic Techniques for Handling Sequences
Nadim Jendoubi
Posted on May 22, 2023
Sequences and Collections are an integral part of any programming language. Python particularly handles them uniformly, i.e. any type of sequence from Strings and XML elements to arrays, lists, and Tuples are handled in a uniform way.
Understanding the variety of these sequences will spare us the reinvention of the wheel as the existing common sequence interface allows us to leverage and support any new sequence types.
The built-in sequences and how to group them
There are 2 types of sequences:
The container sequences: these hold items of different types, including nested containers. Like list, tuple, and double-ended queues.
The flat sequences: these hold items of simple types. Like str and byte.
A container sequence holds references to the objects it contains, which may be of any type, while a flat sequence stores the value of its contents in its own memory space, not as distinct Python objects.
Another way to group sequences is Mutability VS Immutability:
Mutable sequences: sequences that can change their values, for example, list, bytearray, array.array, and collections.deque.
Immutable sequences: sequences that can not change their values, for example, tuple, str, and bytes.
Let’s see what we can learn about Sequences! ✌️
List Comprehensions
ListComp is a very concise way to create a list from an existing list, sometimes by operating on the existing items, using a simpler cleaner more compact syntax, all of this without having to deal with Python lambda.
fruits = ["apple", "banana", "cherry", "kiwi", "mango"]
newlistUsingForLoop = []
newlistUsingListComp = []
# using traditional for loop
for x in fruits:
if "a" in x:
newlistUsingForLoop.append(x)
print(newlistUsingForLoop) # ['apple', 'banana', 'mango']
# using ListComp
newlistUsingListComp = [x for x in fruits if "a" in x]
print(newlistUsingListComp) # ['apple', 'banana', 'mango']
# we can notice how easily readable the list comprehension version
Generator Expressions
A generator expression(GenExps) is an expression that returns a generator object, i.e. a function that contains a yield statement and returns a generator object.
GenExps use the same syntax as ListComps, but are enclosed in parentheses rather than brackets.
# create the generator object
squares_generator = (i * i for i in range(5))
# iterate over the generator and print the values
for i in squares_generator:
print(i)
# this will output the square of numbers from 0 to 4
colors = ['black', 'white']
sizes = ['S', 'M', 'L']
for tshirt in (f'{c} {s}' for c in colors for s in sizes):
print(tshirt)
# black S
# black M
# black L
# white S
# white M
# white L
Tuples are Not Just Immutable Lists
Python presents Tuples as Immutable Lists, only this does not do them justice as they can be used as immutable lists and also as records with no field names.
- Tuples as records:
Tuples can serve as temporary records with no field names, this can be done only if the number of items is fixed and the order of items is respected as it is important.
coordinates = [(33.9425, -118.408056), (31.9425, -178.408056)]
for lat, _ in coordinates:
print(f 'cordinate latitude: {lat}')
# this will print the latitude each time ignoring the longitude value
There are two ways of creating tuples with named fields but in this instance, they will be regarded as data classes and it is not what we want to discuss here.
- Tuples as immutable lists:
Tuples are also highly used in Python standard library for their immutability which brings clarity and performance optimization to the code.
a = (10, 'alpha', [1, 2])
b = (10, 'alpha', [1, 2])
print(a == b)
# True
b[-1].append(99)
print(a == b)
# False
print(b)
# (10, 'alpha', [1, 2, 99])
There is one caveat to mention, as portrayed above in the code example, only the references contained in the tuple are immutable, the objects held in the references can change their values.
Changing the value of an item in a tuple can lead to serious bugs as tuples are hashable, it is better to use a different data structure for your specific use case.
Unpacking Sequences and Iterables
Unpacking is an operation that consists of assigning an iterable of values to a list of variables in a single assignment statement, it avoids unnecessary and error-prone use of indexes to extract elements from sequences.
coordinates = (33.9425, -118.408056)
latitude, longitude = coordinates # unpacking
print(latitude)
# 33.9425
print(longitude)
# -118.408056
We can use the excess operator *
for tuples and the excess operator **
for dictionaries when unpacking too.
The target of unpacking can use nesting, i.e. something like (a, b, (c, d)). Python will do the right thing if the value has the same nesting structure.
# Tuple Example
fruits = ("apple", "mango", "papaya", "pineapple", "cherry")
(green, *tropic, red) = fruits
print(green)
# apple
print(tropic)
# ['mango', 'papaya', 'pineapple']
print(red)
# cherry
# Dictionary Example
def myFish(**fish):
for name, value in fish.items():
print(f'I have {value} {name}')
fish = {
'guppies': 2,
'zebras' : 5,
'bettas': 10
}
myFish(**fish)
# I have 2 guppies
# I have 5 zebras
# I have 10 bettas
# Nested Unpacking Example
metro_areas = [
('Tokyo', 'JP', 36.933, (35.689722, 139.691667)),
('Delhi NCR', 'IN', 21.935, (28.613889, 77.208889)),
('Mexico City', 'MX', 20.142, (19.433333, -99.133333)),
('New York-Newark', 'US', 20.104, (40.808611, -74.020386)),
('São Paulo', 'BR', 19.649, (-23.547778, -46.635833)),
]
print(f'{"":15} | {"latitude":>9} | {"longitude":>9}')
for name, _, _, (lat, lon) in metro_areas:
if lon <= 0:
print(f'{name:15} | {lat:9.4f} | {lon:9.4f}')
# | latitude | longitude
# Mexico City | 19.4333 | -99.1333
# New York-Newark | 40.8086 | -74.0204
# São Paulo | -23.5478 | -46.6358
Sequence Pattern Matching
Pattern matching is done with the match/case statement. It is very similar to the if-elif-else statement only cleaner more readable and wields the power of Destructuring.
Destructuring is a more advanced form of unpacking, as it allows writing sequence patterns as tuples or lists or any combination of both.
Here’s a nice example I found on the GUICommits website that could simplify pattern matching with sequences.
baskets = [
["apple", "pear", "banana"],
["chocolate", "strawberry"],
["chocolate", "banana"],
["chocolate", "pineapple"],
["apple", "pear", "banana", "chocolate"],
]
match basket:
# Matches any 3 items
case [i1, i2, i3]:
print(f"Wow, your basket is full with: '{i1}', '{i2}' and '{i3}'")
# Matches >= 4 items
case [_, _, _, *_] as basket_items:
print(f"Wow, your basket has so many items: {len(basket_items)}")
# 2 items. First should be chocolate, second should be strawberry or banana
case ["chocolate", "strawberry" | "banana"]:
print("This is a superb combination")
# Any amount of items starting with chocolate
case ["chocolate", *_]:
print("I don't know what you plan but it looks delicious")
# If nothing matched before
case _:
print("Don't be cheap, buy something else")
One thing to point out, it’s that the *_
matches any number of items, without binding them to a variable. Using *
extra instead of *_
would bind the items to extra as a list with 0 or more items.
Time to Slice them Up
A common feature of list, tuple, str, and all sequence types in Python is the support of slicing operations.
We slice a list like this: seq[start, stop, step]
, step is the number of items to skip. To evaluate the expression seq[start:stop:step]
, Python calls The Special Method seq.getitem(slice(start, stop, step))
.
# On a list
l = [10, 20, 30, 40, 50, 60]
l[:2] # [10, 20]
l[2:] # [30, 40, 50, 60]
l[:3] # [10, 20, 30]
# On a str
s = 'bicycle'
s[::3] # 'bye'
s[::-1] # 'elcycib'
s[::-2] # 'eccb'
# Add, Mul, Del, assign operations on slices
l = list(range(10)) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
l[2:5] = [20, 30] # [0, 1, 20, 30, 5, 6, 7, 8, 9]
del l[5:7] # [0, 1, 20, 30, 5, 8, 9]
print(5 * 'abcd') # 'abcdabcdabcdabcdabcd'
There are different uses for different types of sequences, and depending on your use case, there are better options. Arrays, Memory views and Double ended queues are prime examples of that.
Memory Views
The built-in memoryview class is a shared-memory sequence type that lets you handle slices of arrays without copying bytes. Using notation similar to the array module, the memoryview.cast method lets you change the way multiple bytes are read or written as units without moving bits around.
The memoryview.cast
returns yet another memoryview
object, always sharing the same memory.
octets = array('B', range(6)) # array of 6 bytes (typecode 'B')
m1 = memoryview(octets)
m1.tolist() # [0, 1, 2, 3, 4, 5]
m2 = m1.cast('B', [2, 3])
m2.tolist() # [[0, 1, 2], [3, 4, 5]]
m3 = m1.cast('B', [3, 2])
m3.tolist() # [[0, 1], [2, 3], [4, 5]]
m2[1,1] = 22
m3[1,1] = 33
print(octets) # array('B', [0, 1, 2, 33, 22, 5])
Posted on May 22, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.