Python concurrent.futures-CodePudding

I have a multiprocessing code, and each process have to analyse same data differently.

I have implemented:

with concurrent.futures.ProcessPoolExecutor() as executor:
   res = executor.map(goal_fcn, p, [global_DataFrame], [global_String])

for f in concurrent.futures.as_completed(res):
   fp = res

and function:

def goal_fcn(x, DataFrame, String):
   return heavy_calculation(x, DataFrame, String)

the problem is goal_fcn is called only once, while should be multiple time

In debugger, I checked now the variable p is looking, and it has multiple columns and rows. Inside goal_fcn, variable x have only first row - looks good.

But the function is called only once. There is no error, the code just execute next steps.

Even if I modify variable p = [1,3,4,5], and of course code. goal_fcn is executed only once

I have to use map() because keeping the order between input and output is required

CodePudding user response：

map works like zip. It terminates once at least one input sequence is at its end. Your [global_DataFrame] and [global_String] lists have one element each, so that is where map ends.

There are two ways around this:

Use itertools.product. This is the equivalent of running "for all data frames, for all strings, for all p". Something like this:

def goal_fcn(x_DataFrame_String):
    x, DataFrame, String = x_DataFrame_String
    ...

executor.map(goal_fcn, itertools.product(p, [global_DataFrame], [global_String]))

Bind the fixed arguments instead of abusing the sequence arguments.

def goal_fcn(x, DataFrame, String):
    pass

bound = functools.partial(goal_fcn, DataFrame=global_DataFrame, String=global_String)
executor.map(bound, p)