`zip` tricks ✨ in Python

vladignatyev

Vladimir Ignatev

Posted on December 16, 2023

`zip` tricks ✨ in Python

I noticed that my recent daily Python quiz about zip (I mean an iterable, not the .zip archive) received a decent attention. Thank you to all and every participant!

No wonder that iterables in Python is an essential tool for pocking up with data. There are few reasons made iterables popular and fun: performance, memory efficiency and expressive syntax.

One of the hardest things to understand in a zoo of iterables is zip, indeed. When I faced this animal first time I realized its' hidden power and got disappointed at the same time 🤭

Actually, zip serves two purposes:

  1. First, is to "sew" two (or more) lists
  2. Second is to tear them apart.

To illustrate this idea, I prepared the following example.

Suppose, that we have a column-oriented database (i.e. ClickHouse).

We are provided by 3 separate columns of data: first names, last names and professions. Every column contains data about persons who use our software. In real world, we could obtain this data from DBMS, APIs, physical files, user input etc.

# Three columns from the columnar database 
first_names = ["Elon", "Steve", "Bob"]
last_names  = ["Musk", "Jobs", "Dorf"] 
professions = ["builds rockets", "grows Macs", "helps startups"]
Enter fullscreen mode Exit fullscreen mode

"Sew" two or more lists together and make a table from columns using zip

First scenario would be to merge them to make a single list of three persons, not three lists of every person field. In other words, we want to make a table from these 3 lists, containing 3 records. The same way as RDBMS (i.e. Postgres) stores data, and the same way people from OOP world think about it.

zip(first_names, last_names, professions)
Enter fullscreen mode Exit fullscreen mode

This expression will return an iterable. You're free to materialize it into list as you wish and finally obtain the desired table in memory:

>>> table = list(zip(first_names, last_names, professions))
[('Elon', 'Musk', 'builds rockets'),
 ('Steve', 'Jobs', 'grows Macs'),
 ('Bob', 'Dorf', 'helps startups')]
Enter fullscreen mode Exit fullscreen mode

Now, table[0] is a record containing Elon Musk who builds rockets, table[1] is a record containing Steve Jobs who enjoyed growing Macs and Apples (Rest in peace, dear Steve!) and vice versa.

Tear records apart and turn them back into columns

To achieve this, zip supports an asterisk syntax:

>>> cols = list(zip(*table))
[('Elon', 'Steve', 'Bob'),
 ('Musk', 'Jobs', 'Dorf'),
 ('builds rockets', 'grows Macs', 'helps startups')]
Enter fullscreen mode Exit fullscreen mode

Back and again, we obtained columns instead of records! I.e., cols[2] represent professions and cols[0] contains only first names.

Iterables

Actually, desperately wrapping iterables into list is no good. In real world, columns or records could be very large and even non-fitting the memory. Every time we turn an iterable into a list, we allocate memory to store all the data.

Furthermore, columns/records could appear as iterables themselves, while we need additional processing or streaming the result to another physical device.

To avoid unnecessary allocations we have to keep iterables as iterables and not convert them into lists early.

Flame 🔥. Comment. Make cool things!

💖 💪 🙅 🚩
vladignatyev
Vladimir Ignatev

Posted on December 16, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related