Home > Software design >  Python concurrent.futures
Python concurrent.futures

Time:02-12

I have a multiprocessing code, and each process have to analyse same data differently.

I have implemented:

with concurrent.futures.ProcessPoolExecutor() as executor:
   res = executor.map(goal_fcn, p, [global_DataFrame], [global_String])

for f in concurrent.futures.as_completed(res):
   fp = res

and function:

def goal_fcn(x, DataFrame, String):
   return heavy_calculation(x, DataFrame, String)

the problem is goal_fcn is called only once, while should be multiple time

In debugger, I checked now the variable p is looking, and it has multiple columns and rows. Inside goal_fcn, variable x have only first row - looks good.

But the function is called only once. There is no error, the code just execute next steps.

Even if I modify variable p = [1,3,4,5], and of course code. goal_fcn is executed only once

I have to use map() because keeping the order between input and output is required

CodePudding user response:

map works like zip. It terminates once at least one input sequence is at its end. Your [global_DataFrame] and [global_String] lists have one element each, so that is where map ends.

There are two ways around this:

  1. Use itertools.product. This is the equivalent of running "for all data frames, for all strings, for all p". Something like this:
def goal_fcn(x_DataFrame_String):
    x, DataFrame, String = x_DataFrame_String
    ...

executor.map(goal_fcn, itertools.product(p, [global_DataFrame], [global_String]))
  1. Bind the fixed arguments instead of abusing the sequence arguments.
def goal_fcn(x, DataFrame, String):
    pass

bound = functools.partial(goal_fcn, DataFrame=global_DataFrame, String=global_String)
executor.map(bound, p)
  • Related