I have a multiprocessing code, and each process have to analyse same data differently.
I have implemented:
with concurrent.futures.ProcessPoolExecutor() as executor:
res = executor.map(goal_fcn, p, [global_DataFrame], [global_String])
for f in concurrent.futures.as_completed(res):
fp = res
and function:
def goal_fcn(x, DataFrame, String):
return heavy_calculation(x, DataFrame, String)
the problem is goal_fcn
is called only once, while should be multiple time
In debugger, I checked now the variable p
is looking, and it has multiple columns and rows. Inside goal_fcn
, variable x
have only first row - looks good.
But the function is called only once. There is no error, the code just execute next steps.
Even if I modify variable p = [1,3,4,5]
, and of course code. goal_fcn
is executed only once
I have to use map()
because keeping the order between input and output is required
CodePudding user response:
map
works like zip
. It terminates once at least one input sequence is at its end. Your [global_DataFrame]
and [global_String]
lists have one element each, so that is where map ends.
There are two ways around this:
- Use
itertools.product
. This is the equivalent of running "for all data frames, for all strings, for all p". Something like this:
def goal_fcn(x_DataFrame_String):
x, DataFrame, String = x_DataFrame_String
...
executor.map(goal_fcn, itertools.product(p, [global_DataFrame], [global_String]))
- Bind the fixed arguments instead of abusing the sequence arguments.
def goal_fcn(x, DataFrame, String):
pass
bound = functools.partial(goal_fcn, DataFrame=global_DataFrame, String=global_String)
executor.map(bound, p)