Sebastian G. Vinci
Posted on February 4, 2019
I wanted to talk about sets, and four great operations they provide:
- Intersection: Elements two sets have in common.
- Union: All the elements from both sets.
- Difference: Elements present on one set, but not on the other.
- Symmetric Difference: Elements from both sets, that are not present on the other.
We are going to do this using Python (3.7), because it rules!
So, let's dive right into it.
Some basic understanding of sets.
Sets are data structures that, based on specific rules they define, provide certain cool operations and guarantees to make a lot of things easier when we use them.
Two basic rules we need to know are these:
- Items in a set are unique. We can not have a set that contains two items that are equal.
- Items in a set are not ordered. Depending on the implementation, items may be sorted using the hash of the value stored, but when you use sets you need to assume items are ordered in a random manner.
And a cool guarantee they provide us: Checking if an element is present on a set has a time complexity constant (O(1)), always. This means, checking if a set contains an element is super fast.
The most popular implementation of a set used to achieve this is the hashed set. Oversimplified, it basically stores the items in a key-value store, where the key is a hash of the value stored.
So, to create a set in Python we use the constructor set
which receives any iterable: my_pets = set(['dog', 'cat', 'parrot'])
(no, I don't have a parrot, nor a cat, dog person here).
Curly braces may be used too, which will avoid creating an intermediate list, but as they are also used for dictionaries, you can not create an empty set with curly braces. Also, someone can get confused, so I avoid them and stick to the constructor.
Sets are collections, therefore they are iterable:
my_pets = set(['dog', 'cat', 'parrot'])
for pet in my_pets:
print('Hello, {}!'.format(pet))
The above program serves as a demonstration of how elements in a set are ordered, below is the output of the program where you can see the order I defined was not respected by the interpreter:
Hello, dog!
Hello, parrot!
Hello, cat!
So, with this information, let's just dive into these cool operations.
Intersection
The intersection between two sets, results in a third set that contains the elements present in both.
For example, if I calculate the intersection between {1, 2, 3}
and {2, 3, 4}
, the output will be {2, 3}
.
If this helps you understand, here is a naive implementation in Python:
a = set([1, 2, 3])
b = set([2, 3, 4])
result = {element for element in a if element in b}
print(result)
Man, is Python awesome or what? Of course, this will print {2, 3}
as expected. Anyway, that was irrelevant because Python provides a function to do this as part of its standard library. The same can be achieved with the following:
a = set([1, 2, 3])
b = set([2, 3, 4])
result = a.intersection(b)
print(result)
Notice that, because of the nature of the operation, a.intersection(b)
and b.intersection(a)
is the same. It is a commutative operation.
Union
The union between two sets, results in a third set with all the elements from both sets.
For example, if I calculate the union between {1, 2, 3}
and {2, 3, 4}
, the output will be {1, 2, 3, 4}
. Kind of like a concatenation, but having in mind that sets can not have repeated elements.
Again, let's see a naive implementation in Python:
a = set([1, 2, 3])
b = set([2, 3, 4])
c = set()
for element in a:
c.add(element)
for element in b:
c.add(element)
print(c)
As expected, the output is {1, 2, 3, 4}
. Again, though, Python does provide a native function to do this:
a = set([1, 2, 3])
b = set([2, 3, 4])
c = a.union(b)
print(c)
Notice that, because of the nature of the operation, a.union(b)
and b.union(a)
is the same. It is a commutative operation.
Difference
The difference between two sets results in a third set with the element from the first, that are not present on the second. Right now I'm going to tell you, this operation is not commutative.
For example, if I have a set a = {1, 2, 3}
and a set b = {2, 3, 4}
, the difference between a and b
is {1}
and the difference between b and a
is {4}
.
Here's a naive implementation:
a = set([1, 2, 3])
b = set([2, 3, 4])
c = {element for element in a if element not in b}
print(c)
This outputs {1}
as expected, and will print {4}
if we invert a
and b
. Again, an implementation using Python's standard lib:
a = set([1, 2, 3])
b = set([2, 3, 4])
c = a.difference(b)
print(c)
Symmetric difference
The symmetric difference between two sets results in a third set with the elements, from both sets, that are not present on the other.
For example, if I calculate the symmetric difference between {1, 2, 3}
and {2, 3, 4}
, the output will be {1, 4}
. This operation is commutative.
Here's a naive implementation:
a = set([1, 2, 3])
b = set([2, 3, 4])
c = set()
for element in a:
if element not in b:
c.add(element)
for element in b:
if element not in a:
c.add(element)
print(c)
The above implementation will print {1, 4}
, as expected. Of course, Python has this on its standard libary too: symmetric_difference
.
a = set([1, 2, 3])
b = set([2, 3, 4])
c = a.symmetric_difference(b)
print(c)
Conclusion
Sets are really useful, and actually pretty fun. Give them a try. I lived a lot of time constrained to lists, and when I finally sat down and learned how to work with sets started finding better, more elegant solutions to a lot of problems.
Posted on February 4, 2019
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.