Mark Douthwaite
Posted on December 16, 2020
This post takes a whistlestop tour tqdm
: a fantastic, easy-to-use, extensible progress bar package for Python. It makes adding simple progress bars to Python processes extremely easy. If you’re a software engineer of some experience, chances are you’ll have used or developed algorithms or data transformations that can take a fair while – perhaps many hours or even days – to complete.
It's not uncommon for software folks to opt to simply print status messages to console, or in some slightly more sophisticated cases use the (excellent and recommended) built-in logging module. In a lot of cases this may well be fine. However, if you’re running a task with many hundreds of steps, or over a data structure with many millions of elements, these approaches are sometimes a little unclear and verbose, and frankly kind of ugly.
Show me the code!
That’s where tqdm
can come in. It has a nice clean API that lets you quickly add progress bars to your code. Plus it has a lightweight ‘time-remaining’ estimation algorithm built in to the progress bar too. For the purposes of this post, take a look at the super-minimal example of a mocked-up loop for web scraping using tqdm
, below:
import time
from tqdm import tqdm
def get():
time.sleep(0.25)
with tqdm(total=100) as progress:
for i in range(100):
get()
progress.update(1)
In this simple example, you set up a tqdm
progress bar that expects a process of 100 steps (say 100 URLs). Then you can run the loop (with a 0.25 second pause between steps), each time updating the progress bar when the step is completed. You can also update the progress bar by arbitrary amounts if we break out of the loop too. That’s two lines of code (plus the import statement) to get a nice little progress bar in your code:
pandas
support
Beyond cool little additions to your program’s outputs, tqdm
also integrates nicely with other widely used packages. Take pandas
for example, the ubiquitous Python data analysis library. Data Scientists love pandas
, but some transformations on data frames can take a fair while. Fortunately, there's support for automatically adding a tqdm
progress bar to calls to the apply
method in pandas
. Take a look at the example below:
df = pd.read_csv("weather.csv")
tqdm.pandas(desc="Applying Transformation")
df.progress_apply(lambda x: x)
When you run this script, you'll see something like this:
Technically, the tqdm.pandas
method monkey patches the progress_apply
method onto pandas
data structures, giving them a modified version of the commonly used apply method. Practically, when we call the progress_apply
method, the package wraps the standard pandas
apply
method with a tqdm
progress bar. This can come in really handy when you’re processing large data frames!
Parallel processing support
There's another common application that's worth mentioning here too: tqdm
is great for setting up progress bars for parallel processes too. Here is an example using some of tqdm
's built in support for updating a progress bar for a parallel map:
import time
from tqdm.contrib.concurrent import process_map
def my_process(_):
time.sleep(0.25)
r = process_map(my_process, range(0, 100), max_workers=2, desc="MyProcess")
In this case, you'll have a single progress bar that gets updated each time a my_process
call finishes. There's a second use case though: how about if you've got a few long-running processes and you want to track these individually? This might be preferable if you want to avoid serialising and de-serialising large objects into and out of processes, for example. You can do that too:
import time
import multiprocessing as mp
from tqdm import tqdm
def my_process(pos):
_process = mp.current_process()
with tqdm(desc=f"Process {pos}", total=100, position=pos) as progress:
for _ in range(100):
time.sleep(0.1)
progress.update(1)
n_cpu = mp.cpu_count(
with mp.Pool(processes=n_cpu, initializer=tqdm.set_lock, initargs=(tqdm.get_lock(),)) as pool:
pool.map(my_process, range(n_cpu))
This should give you an output something along the lines of:
There's a Gist of this example you can use too.
Jupyter support
The last integration I'll be touching on in this post is the built-in support for using tqdm
in a Jupyter Notebook. To do this, you'll need to make sure you've installed Jupyter, as well ipywidgets
. You'll then need to run:
jupyter nbextension enable --py widgetsnbextension
To enable extensions. With this set up, in a cell in a new notebook, you should be able to run the example from earlier:
from tqdm.notebook import tqdm
arr = list(range(100))
with tqdm(desc="My Progress bar", total=len(arr)) as progress:
for element in arr:
progress.update(1)
And see something similar to this:
Cool, right?
Further reading
Interested in finding out more about tqdm
? Here's their GitHub.
The cover image for this post was taken from a TED talk on progress bars. It's worth checking out.
Posted on December 16, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 13, 2024