Multiprocessing pool outputting <multiprocessing.pool.IMapIterator object at "Insert random-CodePudding

I am trying to optimize a simple coin flip program that calculates the probability of heads and tails by simulating flipping a coin(mostly to learn multiprocessing), and I keep getting the output: <multiprocessing.pool.IMapIterator object at "insert seemingly random letters and numbers">. I know I need to make each object an integer, but i'm not sure how to do so.

import random
from multiprocessing import Pool

def rolldie(): # 1 = heads, 2 = tails
    dices = int(random.randrange(1,3))
    return(int(dices))

def main():
    out = []
    if __name__ == '__main__':
        pool = Pool()
        for i in range(100): #this can be any number
            out.append(pool.imap(rolldie(), 0))
        return(out)

print(main())

CodePudding user response：

Your worker function doesn't accept any arguments so you can use apply_async to schedule the tasks and then get the results:

(I've added some sleep time to simulate a long running job)

test.py:

import random
import time

from multiprocessing import Pool


def rolldie(idx):  # 1 = heads, 2 = tails
    t = random.uniform(1, 3)

    print(f"START: {idx} ({t:.3f}s)")

    time.sleep(t)
    i = random.randint(1, 2)

    print(f"END: {idx}")

    return i


def main():
    results, out = [], []

    with Pool() as pool:
        for i in range(10):
            results.append(pool.apply_async(rolldie, args=(i,)))

        for result in results:
            out.append(result.get())

    print(out)


if __name__ == "__main__":
    main()

Test:

$ python test.py
START: 0 (2.737s)
START: 1 (2.692s)
START: 2 (1.405s)
START: 3 (1.397s)
END: 3
START: 4 (2.537s)
END: 2
START: 5 (2.472s)
END: 1
START: 6 (2.373s)
END: 0
START: 7 (1.262s)
END: 5
START: 8 (1.416s)
END: 4
START: 9 (2.151s)
END: 7
END: 6
END: 8
END: 9
[2, 2, 2, 2, 2, 1, 1, 2, 2, 1]

I've got only 4 CPUs on the machine where I ran the test so only 4 jobs were scheduled at the beginning and the remaining ones were waiting in the queue before they could run.

CodePudding user response：

You have several problems with your code that I would like to review with you. You can use method multiprocessing.Pool.imap but not in the way you are using it.

First, the if __name__ == '__main__': check you have needs to be moved so that the call to main() is within the block. Otherwise, when you create the multiprocessing pool, on those platforms that use spawn to create new processes each new process started will be executing all statements at global level including the print(main() statement. But with the if __name__ == '__main__': in place where you currently have it, main will return None N times where N is the number of processes being started in the pool.

Next, you have pool.imap(rolldie(), 0). The first argument to imap should specify a function. But here you are actually calling the function and what you are therefore passing as the first argument is the return value from calling rolldie(). This clearly is not correct. The second argument to imap should be an iterable. Your rolldie function would thus be called once for each element of the iterable with that element being passed as the argument. Therefore, rolldie would have to be modified to take a single argument. We could call it the trial_number, which rolldie can ignore. The return value from imap is itself an iterable, which when iterated gives all the return values that rolldie returned. You can either iterate this iterable and get the results as they are returned or just convert the result into a list all at once:

import random
from multiprocessing import Pool

def rolldie(trial_number): # 1 = heads, 2 = tails
    dices = int(random.randrange(1,3))
    return(int(dices))

def main():
    pool = Pool()
    return list(pool.imap(rolldie, range(100)))

if __name__ == '__main__':
    print(main())

Prints:

[2, 1, 2, 2, 2, 1, 2, 2, 1, 1, 2, 1, 1, 2, 1, 2, 2, 2, 2, 1, 1, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 2, 1, 2, 1, 1, 2, 2, 2, 1, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2]

According to the documentation on random.seed(a=None, version=2):

If a is omitted or None, the current system time is used. If randomness sources are provided by the operating system, they are used instead of the system time (see the os.urandom() function for details on availability).

If I run the above code over again I might get the same exact results suggesting that the current system time is being used and that each process in the pool is initializing its random number generator identically and therefore generating identical sequences of random numbers. This is not good!

Instead, we should ensure that each process initializes its random number generator uniquely. The initializer argument to multiprocessing.Pool constructor specifies a function to be called once for each process in the pool to initialize it. Here we are seeding the random number generator with the current process's Process ID.

import random
from multiprocessing import Pool, current_process

def init_pool():
    random.seed(current_process().pid)

def rolldie(trial_number): # 1 = heads, 2 = tails
    dices = int(random.randrange(1,3))
    return(int(dices))

def main():
    pool = Pool(initializer=init_pool)
    return list(pool.imap(rolldie, range(100)))

if __name__ == '__main__':
    print(main())

Prints:

[2, 2, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 1, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 2, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 2, 2, 1, 1, 2, 1, 1, 2, 2, 2, 2, 2, 1, 1, 2, 1, 2]