Let's say we have some operation like:
groups = ['A','B','C']
idx = [n for n in range(1000)]
for group in groups:
for i in idx:
# Compute something
where idx
is much larger than groups
.
To speed this up, I have looked at multiprocessing
and joblib
in Python. However, should we parallelize over the outer loop (split the for group in groups
logic into parallel), or parallelize over the inner loop (split the for i in idx
logic into parallel)?
CodePudding user response:
This wildly depends on the number of groups, number of cores, heaviness of the actual computation and probably several other factors I'm forgetting. You can avoid having to think about this by creating a single iterator that produces all the tuples of (group, i)
that appear in the inner loop, i.e. collapse the two loops into one. This can be done with itertools' (cross) product
:
Rough example:
from itertools import product
from multiprocessing import Pool
with Pool() as p:
p.map(compute_something, product(groups, idx)))
This should work decently well in most situations.
CodePudding user response:
The main thing to determine is what will bind the task
A handful of simultanious computational tasks can pin a cpu
On the other side, a cpu can almost idle doing 10,000 processes on the internet. The difficulty is making sure that the job queues arent too long, or short for each worker