A Comprehensive Guide to Slicing in Python
Bas Steins
Posted on January 31, 2022
In Python, some objects like str
s or list
s can sliced. For example, you can get the first element of a list or a string with
my_list = [1,2,3]
print(my_list[0]) # 1
my_string = "Python"
print(my_string[0]) # P
Python uses square brackets ([
and ]
) to access single elements of objects that can be decomposed into parts.
However, there is more to the inside of these square brackets than just accessing individual elements:
Negative Indexing
Perhaps you already know that you can use negative indices in Python like so:
my_list = list("Python")
print(my_list[-1])
Something like my_list[-1]
represents the last element of a list, my_list[-2]
represents the second last element and so on.
The Colon
What if you want to retrieve more than one element from a list? Say you want everything from start to end except for the very last one. In Python, no problemo:
my_list = list("Python")
print(my_list[0:-1])
Or, what if you want every even element of your list, i.e. element 0
, 2
, etc.?
For this we would need to go from the first element to the last element but skip every second item. We could write that as:
my_list = list("Python")
print(my_list[0:len(my_list):2]) # ['P', 't', 'o']
The slice
Object
Behind the scenes, the index we use to access individual items of a list
-like object consists of three values: (start, stop, step)
. These objects are called slice objects and can be manually created with the built-in slice
function.
We can check if the two are indeed the same:
my_list = list("Python")
start = 0
stop = len(my_list)
step = 2
slice_object = slice(start, stop, step)
print(my_list[start:stop:step] == my_list[slice_object]) # True
Have a look at the graphic above. The letter P
is the first element in our list, thus it can be indexed by 0
(see the numbers in the green boxes). The list has a length of 6
, and therefore, the first element can alternatively be indexed by -6
(negative indexing is shown in the blue boxes).
The numbers in the green and blue boxes identify single elements of the list. Now, look at the numbers in the orange boxes. These determine the slice indices of the list. If we use the slice's start
and stop
, every element between these numbers is covered by the slice. Some examples:
"Python"[0:1] # P
"Python"[0:5] # Pytho
That's just an easy way to remember that the start
value is inclusive and the end
value is exclusive.
Sane defaults
Most of the time, you want to slice
your list
by
- starting at
0
- stopping at the end
- stepping with a width of
1
Therefore, these are the default values and can be omitted in our :
syntax:
print(my_list[0:-1] == my_list[:-1])
print(my_list[0:len(my_list):2] == my_list[::2])
Technically, whenever we omit a number between colons, the omitted ones will have the value of None
.
And in turn, the slice object will replace None
with
-
0
for the start value -
len(list)
for the stop value -
1
for the step value
However, if the step
value is negative, the None
s are replaced with
-
-1
for the start value -
-len(list) - 1
for the stop value
For example, "Python"[::-1]
is technically the same as "Python"[-1:-7:-1]
Special Case: Copy
There is a special case for slicing which can be used as a shortcut, sometimes:
If you use just the default values, i.e. my_list[:]
it will give you the exact same items:
my_list = list("Python")
my_list_2 = my_list[:]
print(my_list==my_list_2)
The elements in the list are indeed the same. However, the list object is not. We can check that by using the id
builtin:
print(id(my_list))
print(id(my_list_2))
Note that every slice operation returns a new object. A copy of our sequence is created when using just [:]
.
Here are two code snippets to illustrate the difference:
a = list("Python")
b = a
a[-1] = "N"
print(a)
# ['P', 'y', 't', 'h', 'o', 'N']
print(b)
# ['P', 'y', 't', 'h', 'o', 'N']
a = list("Python")
b = a[:]
a[-1] = "N"
print(a)
# ['P', 'y', 't', 'h', 'o', 'N']
print(b)
# ['P', 'y', 't', 'h', 'o', 'n']
Examples
Some often used examples:
Use case | Python Code |
---|---|
Every element | no slice, or [:] for a copy |
Every second element |
[::2] (even) or [1::2] (odd) |
Every element but the first one | [1:] |
Every element but the last one | [:-1] |
Every element but the first and the last one | [1:-1] |
Every element in reverse order | [::-1] |
Every element but the first and the last one in reverse order | [-2:0:-1] |
Every second element but the first and the last one in reverse order | [-2:0:-2] |
Assignments
p = list("Python")
# ['P', 'y', 't', 'h', 'o', 'n']
p[1:-1]
# ['y', 't', 'h', 'o']
p[1:-1] = 'x'
print(p)
['P', 'x', 'n']
p = list("Python")
p[1:-1] = ['x'] * 4
p
# ['P', 'x', 'x', 'x', 'x', 'n']
Understanding the loop
Every slice
object in Python has an indices
method. This method will return a pair of (start
, end
, step
) with which you could rebuild a loop equivalent to the slicing operation.
Sounds complicated? Let's have a closer look:
Let's start with a sequence:
sequence = list("Python")
Then, we create a slice object. Let's take every second element, i.e. [::2]
.
my_slice = slice(None, None, 2) # equivalent to `[::2]`.
Since we're using None
s, the slice object needs to calculate the actual index
values based on the length of our sequence. Therefore, to get our index triple, we need to pass the length to the indices
method, like so:
indices = my_slice.indices(len(sequence))
This will give us the triple (0, 6, 2)
. We now can recreate the loop like so:
sequence = list("Python")
start = 0
stop = 6
step = 2
i = start
while i != stop:
print(sequence[i])
i = i+step
This accesses the same elements of our list as the slice
itself would do.
Making Own Classes Sliceable
Python wouldn't be Python if you could not use the slice object in your own classes.
Even better, slices do not need to be numerical values. We could build an address book which sliceable by alphabetical indices.
import string
class AddressBook:
def __init__(self):
self.addresses = []
def add_address(self, name, address):
self.addresses.append((name, address))
def get_addresses_by_first_letters(self, letters):
letters = letters.upper()
return [(name, address) for name, address in self.addresses if any(name.upper().startswith(letter) for letter in letters)]
def __getitem__(self, key):
if isinstance(key, str):
return self.get_addresses_by_first_letters(key)
if isinstance(key, slice):
start, stop, step = key.start, key.stop, key.step
letters = (string.ascii_uppercase[string.ascii_uppercase.index(start):string.ascii_uppercase.index(stop)+1:step])
return self.get_addresses_by_first_letters(letters)
address_book = AddressBook()
address_book.add_address("Sherlock Holmes", "221B Baker St., London")
address_book.add_address("Wallace and Gromit", "62 West Wallaby Street, Wigan, Lancashire")
address_book.add_address("Peter Wimsey", "110a Piccadilly, London")
address_book.add_address("Al Bundy", "9764 Jeopardy Lane, Chicago, Illinois")
address_book.add_address("John Dolittle", "Oxenthorpe Road, Puddleby-on-the-Marsh, Slopshire, England")
address_book.add_address("Spongebob Squarepants", "124 Conch Street, Bikini Bottom, Pacific Ocean")
address_book.add_address("Hercule Poirot", "Apt. 56B, Whitehaven Mansions, Sandhurst Square, London W1")
address_book.add_address("Bart Simpson", "742 Evergreen Terrace, Springfield, USA")
print(string.ascii_uppercase)
print(string.ascii_uppercase.index("A"))
print(string.ascii_uppercase.index("Z"))
print(address_book["A"])
print(address_book["B"])
print(address_book["S"])
print(address_book["A":"H"])
Explanation
The get_addresses_by_first_letters
method
def get_addresses_by_first_letters(self, letters):
letters = letters.upper()
return [(name, address) for name, address in self.addresses if any(name.upper().startswith(letter) for letter in letters)]
This method filters all addresses belonging to a name
starting with any letter in the letters
argument. First, we make the function case insensitive by converting our letters
to uppercase. Then, we use a list comprehension over our internal addresses
list. The condition inside the list comprehension tests if any of the provided letters matches the first letter of the corresponding name
value.
The __getitem__
method
To make our AddressBook
objects sliceable, we need to overwrite Python's magic double underscore method __getitem__
.
def __getitem__(self, key):
if isinstance(key, str):
return self.get_addresses_by_first_letters(key)
if isinstance(key, slice):
start, stop, step = key.start, key.stop, key.step
letters = (string.ascii_uppercase[string.ascii_uppercase.index(start):string.ascii_uppercase.index(stop)+1:step])
return self.get_addresses_by_first_letters(letters)
At first, we check if our key is a str
. This will be the case if we access our object with a single letter in square brackets like so: address_book["A"]
. We can just return any addresses whose name starts with the given letter for this trivial case.
The interesting part is when the key
is a slice
object. For example, an access like address_book["A":"H"]
would match that condition.
First, we identify all letters alphabetically between A
and H
. The string
module in Python lists all (latin) letters in in string.ascii_uppercase
. We use a slice
to extract the letters between the given letters. Note the +1
in the second slice parameter. This way, we ensure that the last letter is inclusive, not exclusive.
After we determined all letters in our sequence, we use the get_addresses_by_first_letters
, which we already discussed. This gives us the result we want.
Posted on January 31, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 12, 2024