How to get the greatest number in a list of numbers using multiprocessing-CodePudding

I have a list of random numbers and I would like to get the greatest number using multiprocessing.

This is the code I used to generate the list:

import random
randomlist = []
for i in range(100000000):
    n = random.randint(1,30000000)
    randomlist.append(n)

To get the greatest number using a serial process:

import time

greatest = 0 # global variable

def f(n):
    global greatest
    if n>greatest:
        greatest = n

if __name__ == "__main__":
    global greatest

    t2 = time.time()
    greatest = 0

    for x in randomlist:
        f(x)    
    
    print("serial process took:", time.time()-t2)
    print("greatest = ", greatest)

This is my try to get the greatest number using multiprocessing:

from multiprocessing import Pool
import time

greatest = 0 # the global variable

def f(n):
    global greatest
    if n>greatest:
        greatest = n

if __name__ == "__main__":
    global greatest
    greatest = 0
    t1 = time.time()
    p = Pool() #(processes=3) 
    result = p.map(f,randomlist)
    p.close()
    p.join()
    print("pool took:", time.time()-t1)
    print("greatest = ", greatest)

The output here is 0. It is clear that there is no global variable. How can I fix this without affecting the performance?

CodePudding user response：

As suggested by @Barmar, divide your randomlist into chunk then process local maximum from each chunk and finally compute global maximum from local_maximum_list:

import multiprocessing as mp
import numpy as np
import random
import time

CHUNKSIZE = 10000

def local_maximum(l):
    m = max(l)
    print(f"Local maximum: {m}")
    return m

if __name__ == '__main__':
    randomlist = np.random.randint(1, 30000000, 100000000)

    start = time.time()
    chunks = (randomlist[i:i CHUNKSIZE]
                  for i in range(0, len(randomlist), CHUNKSIZE))

    with mp.Pool(mp.cpu_count()) as pool:
        local_maximum_list = pool.map(local_maximum, chunks)
    print(f"Global maximum: {max(local_maximum_list)}")
    end = time.time()
    print(f"MP Elapsed time: {end-start:.2f}s")

Performance

It's very interesting how the creation of the random list impacts the performance of multiprocessing

Scenario 1:
randomlist = np.random.randint(1, 30000000, 100000000)
MP Elapsed time: 1.63s

Scenario 2:
randomlist = np.random.randint(1, 30000000, 100000000).tolist()
MP Elapsed time: 6.02s

Scenario 3
randomlist = [random.randint(1, 30000000) for _ in range(100000000)]
MP Elapsed time: 7.14s

Scenario 4:
randomlist = list(np.random.randint(1, 30000000, 100000000))
MP Elapsed time: 184.28s

Scenario 5:
randomlist = []
for _ in range(100000000):
    n = random.randint(1, 30000000)
    randomlist.append(n)
MP Elapsed time: 7.52s