I want to create two lists by running two functions (returning a value each for every run) in parallel. My code below works, but is still taking too much time. Is there a more efficient way to parallelize this code?
import time
from joblib import Parallel, delayed
catchments = 50 #define number of catchments to plot here
randomlist = random.sample(range(2, 2100), catchments)
def budx(i): #Time-taking task....
try:
catch = slicecatch(i)
return catch.PETNatVeg.mean().values/catch.Prec.mean().values
except IndexError as e:
pass
def budy(i): #Time-taking task....
try:
catch = slicecatch(i)
return catch.TotalET.mean().values/catch.Prec.mean().values
except IndexError as e:
pass
start_time = time.perf_counter()
bud_x = Parallel(n_jobs=-1)(delayed(budx)(i) for i in randomlist)
bud_y = Parallel(n_jobs=-1)(delayed(budy)(i) for i in randomlist)
finish_time = time.perf_counter()
CodePudding user response:
The way you've written your code, you're first running all your budx
instances, waiting for them to complete, and only then running your budy
instances. That is, you are sequentially running two sets of parallel tasks.
Here's one possible way of doing that, noting that (a) I was not previously familiar with joblib, so there may be a more canonical form, and (b) I've replaced your budx
and budy
implementations with code that I can actually run:
import time
from joblib import Parallel, delayed
import random
catchments = 50 #define number of catchments to plot here
randomlist = random.sample(range(2, 2100), catchments)
def budx(i): #Time-taking task....
print("start budx", i)
time.sleep(random.randint(0, 10))
print("end budx", i)
return ("budx", i)
def budy(i): #Time-taking task....
print("start budy", i)
time.sleep(random.randint(0, 10))
print("end budy", i)
return ("budy", i)
start_time = time.perf_counter()
results = Parallel(n_jobs=-1)(
[delayed(budx)(i) for i in range(5)]
[delayed(budy)(i) for i in range(5)])
finish_time = time.perf_counter()
print("total time:", finish_time - start_time)
print("results", results)
If I were writing this I would probably opt for native Python tools like concurrent.futures
rather than a third-party module like joblib
(unless there are additional features provided by joblib
that make your life easier).