I am trying to use parallel processing in python using the following code:
import os
import datetime
import numpy as np
import FarimaModule
from statsmodels.tsa.arima.model import ARIMA
import matplotlib.pyplot as plt
import multiprocessing as mp
# Here I define some variables: p_max,q_max,m_list,wlen,mstep,fs, listFile
def implement(fname,p_max,q_max,m_list,wlen,mstep,fs):
# It is a really long code
# run the function 'implement' in parallel for different values of the input variable 'fname'
pool = mp.Pool(10)
results = [pool.apply(implement, args=(fname,p_max,q_max,m_list,wlen,mstep,fs)) for fname in listFile]
pool.close()
But it throws the following error:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
Others have posted questions with the same error. But I am not able to implement solutions posted there as it is unclear how do I adapt those solutions for my code.
CodePudding user response:
On some systems, multiprocessing has to spawn a new copy of python and import your module to get to the worker code. Anything at module level is executed again... including the parent code that creates the pool. This would be an infinite recursion except python detects the problem and gives you a handy tip. You would follow it by
import os
import datetime
import numpy as np
import FarimaModule
from statsmodels.tsa.arima.model import ARIMA
import matplotlib.pyplot as plt
import multiprocessing as mp
# Here I define some variables: p_max,q_max,m_list,wlen,mstep,fs, listFile
def implement(fname,p_max,q_max,m_list,wlen,mstep,fs):
# It is a really long code
if __name__ == "__main__":
# run the function 'implement' in parallel for different values of the input variable 'fname'
pool = mp.Pool(10)
results = [pool.apply(implement, args=
(fname,p_max,q_max,m_list,wlen,mstep,fs)) for fname in listFile]
pool.close()
A top level python script always has the name "__main__"
. When imported by the subprocess, its now a module and has a different name.
pool.apply is likely not the method you want - it waits for the pool worker to complete. map
may be the better choice. I chunks (groups) input. In your case, with an expensive calculation, you likely want a small chunksize. starmap
is just map
with multiple parameters.
if __name__ == "__main__":
# run the function 'implement' in parallel for different values of the input variable 'fname'
with mp.Pool(10) as pool:
results = pool.starmap(implement,
[(fname,p_max,q_max,m_list,wlen,mstep,fs))
for fname in listFile],
chunksize=1)