M Sharma
Posted on July 20, 2020
So first thing first, X times speed up depends on the number of threads or processes you decide to use.
We are going to use a python package joblib - and specifically 2 constructs from it:
- 'Parallel' - which helps us define number of parallel execution paths(n_jobs) we want to create and
- 'delayed' a wrapper to supply the function which we want to execute parallelize.
Non parallel way :-
Write a function which takes one input/s and performs the processing that u want to do based on your use case. Call this function in loop and finish.
This is the called linear processing or sequential execution.
Now, there can be cases where those items from your loop(because you have multiple of items to process), if given a way, could be processed independently from each other.
Few examples :-
Resizing of images stored in a directory. Because your end goal is to resize all the images and so there is no need to wait for first image to get resized before starting resize operation on another.
Gathering data by making API call(or making multiple API calls) for a single entity (e.g. user id, item id etc.). Let's say you have list of users and you wanted to gather various data points/attributes for all the transactions performed, items bought, events attended etc. by calling different downstream APIs.
Crawling different web sites and storing info etc etc..you get the idea.
Convert linear to parallel :-
So first step remains same, you write a function which takes input/s and completes the logic. It may or may not return anything. for e.g. in our image resize it could save the resized images and that is all where as in our gathering data for an item/user could return a dict and we can handle both the scenarios.
So take a look at following code with a simple function(process_data) that we run sequentially and then apply parallelism by utilizing joblib.
import time
import math
from joblib import Parallel, delayed
def process_data(x):
# adding sleep for half a second to simulate time consuming logic
time.sleep(.5)
return math.sqrt(x)
if __name__ == "__main__":
# list of items we want to process. This could be list of images, list of users, websites etc.
items = range(100, 2001, 100)
# So that we can complare later, lets do sequentiall/linear executions.
linear_start_time = time.time()
for i in items:
process_data(i)
linear_end_time = time.time()
print('Linear execution took {:.4f} seconds'.format(linear_end_time - linear_start_time))
# Here comes parallel execution
parallel_start_time = time.time()
# n_jobs defines number of parallel executions you want to perform
Parallel(n_jobs=10)(delayed(process_data)(item) for item in items)
parallel_end_time = time.time()
print('Parallel execution took {:.4f} seconds'.format(parallel_end_time - parallel_start_time))
If you run above code you will see something similar to following:
Linear execution took 10.0715 seconds
Parallel execution took 2.6840 seconds
which shows how it speeds up the program by X times!
Also, Parallel returns list of outputs from function passed to delayed so you could handle returns easily. Modify above code slightly to capture returns i.e.
results = Parallel(n_jobs=10)(delayed(process_data)(item) for item in items)
print(results)
[10.0, 14.142135623730951, 17.320508075688775, 20.0, 22.360679774997898, 24.49489742783178, 26.457513110645905, 28.284271247461902, 30.0, 31.622776601683793, 33.166247903554, 34.64101615137755, 36.05551275463989, 37.416573867739416, 38.72983346207417, 40.0, 41.23105625617661, 42.42640687119285, 43.58898943540674, 44.721359549995796]
Now, there is a lot more to explore with joblib. There are ways to parallelize execution by threads, by processes, disk-caching mechanism etc. But I hope this article gives good enough explanation for beginners to start!
p.s. Look at concurrent.futures for python's in-built support for asynchronous executions.
Posted on July 20, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.