Is this speed up normal for multipprocessing starmap?-CodePudding

Code:

from multiprocessing import Pool
from itertools import repeat
import timeit
import multiprocessing

def add_one(number, flag):

    new_number = None
    if flag=="a":
        new_number = number 1

    return (number, new_number)

numbers = list(range(10000000))
pool = Pool(multiprocessing.cpu_count())

for i in range(3):
    print(i)
    start_time = timeit.default_timer()
    flag = "a"
    new_numbers = pool.starmap(add_one, zip(numbers, repeat(flag)))
    print('time taken: ', timeit.default_timer() - start_time)

The pool count configurations are 3, 1 and multiprocessing.cpu_count() respectively. And they time taken are below:

(base) ins-MacBook-Pro-2 graph_test % python test.py
0
time taken:  7.543301321
1
time taken:  7.8004514
2
time taken:  7.892797112
ins-MacBook-Pro-2 graph_test % python test.py
0
time taken:  11.030308790000001
1
time taken:  11.616422934
2
time taken:  11.846459496999998
ins-MacBook-Pro-2 graph_test % python test.py
0
time taken:  6.376773281
1
time taken:  6.876658618999999
2
time taken:  6.518348029

My Mac has 8 cores. It doesn't seem to speed up a lot. Is my way of using starmap correct?

CodePudding user response：

multiprocessing won't yield significant speedups for fine-grained parallelism - you need to do "big(ger) work" for it pay off in wall clock time.

The work done by an invocation of your add_one() function is simply trivial compared to the overheads of using multiprocessing at all. The main program, under the covers, has to pickle the arguments, send those strings over an interprocess communication mechanism; then the worker has to unpickle the arguments; then the function does a tiny amount of work to compare a string to "a" and possibly add 1 to an integer, then build a 2-tuple of results; then the multiprocessing machinery has to, again under the covers, pickle that 2-tuple; send the pickle back over an interprocess communication mechanism to the main program; then the main program has to unpickle the result string, and append the resulting 2-tuple to a list.

You "see" very little of the work going on to support all this, but the support work costs, in all, much more than what an invocation of add_one() does.

To see speedups more in line with the number of cores in use, add_one() needs to take more time all on its own, so there's more potential to do work in parallel. For example, change:

if flag=="a":

if (flag * 100000)[0] =="a":

No, that's not a sensible change. It's just a way to make add_one() consume more time so you can see that adding cores really does help then. It doesn't change the overheads at all, it just increases the amount of work that needs to be done outside of the multiprocessing implementation. Making all those useless "big strings" can - and will - be done in parallel.

Note: don't mistake "cores" for "physical CPUs". .cpu_count() generally returns the number of "logical" (not physical) cores. For multiprocessing, the number of physical cores is typically much more important.