Have an issue with parallelising a code, using joblib.Parallel.
When the backend is threading
it works as intended as seen below, in terms of results.
Meaning that both print
show the intended result. When changing the backend to multiprocessing
the code
- runs way faster
- the first print works as intended
- the second print (which is the final results) is
None
, completely ignoring what it printed
Here is a similar-MWE:
from joblib import Parallel, delayed
def E_th(i,tt,out_list):
out_list[tt] = tt i
print(out_list[tt])#>> prints correct results
return 1
if __name__ == "__main__":
time = range(0,10)
for i in range(0,2):
out_list = [None]*len(time)
Parallel(n_jobs=64,backend='threading')(delayed(E_th)(i,tt,out_list) for tt in range(len(time)))
print(out_list) #>> prints correct results
from joblib import Parallel, delayed
def E_th(i,tt,out_list):
out_list[tt] = tt i
print(out_list[tt])#>> prints correct results
return 1
if __name__ == "__main__":
time = range(0,10)
for i in range(0,2):
out_list = [None]*len(time)
Parallel(n_jobs=64,backend='multiprocessing')(delayed(E_th)(i,tt,out_list) for tt in range(len(time)))
print(out_list) #>> prints [None,None..]
I'm probably super bad at this so if there is a simple way to understand whats going on and I'll try to fix it :)
CodePudding user response:
Multithreaded: The out_list
is passed by reference to the child threads. So when they change it it changes in all the threads.
Multiprocess: The out_list
(in fact the whole memory footprint) is copied to the child processes. So when children update the list that change is not propagated up to the parent where the print happens.