I have a list of random numbers and I would like to get the greatest number using multiprocessing.
This is the code I used to generate the list:
import random
randomlist = []
for i in range(100000000):
n = random.randint(1,30000000)
randomlist.append(n)
To get the greatest number using a serial process:
import time
greatest = 0 # global variable
def f(n):
global greatest
if n>greatest:
greatest = n
if __name__ == "__main__":
global greatest
t2 = time.time()
greatest = 0
for x in randomlist:
f(x)
print("serial process took:", time.time()-t2)
print("greatest = ", greatest)
This is my try to get the greatest number using multiprocessing:
from multiprocessing import Pool
import time
greatest = 0 # the global variable
def f(n):
global greatest
if n>greatest:
greatest = n
if __name__ == "__main__":
global greatest
greatest = 0
t1 = time.time()
p = Pool() #(processes=3)
result = p.map(f,randomlist)
p.close()
p.join()
print("pool took:", time.time()-t1)
print("greatest = ", greatest)
The output here is 0. It is clear that there is no global variable. How can I fix this without affecting the performance?
CodePudding user response:
As suggested by @Barmar, divide your randomlist
into chunk then process local maximum from each chunk and finally compute global maximum from local_maximum_list
:
import multiprocessing as mp
import numpy as np
import random
import time
CHUNKSIZE = 10000
def local_maximum(l):
m = max(l)
print(f"Local maximum: {m}")
return m
if __name__ == '__main__':
randomlist = np.random.randint(1, 30000000, 100000000)
start = time.time()
chunks = (randomlist[i:i CHUNKSIZE]
for i in range(0, len(randomlist), CHUNKSIZE))
with mp.Pool(mp.cpu_count()) as pool:
local_maximum_list = pool.map(local_maximum, chunks)
print(f"Global maximum: {max(local_maximum_list)}")
end = time.time()
print(f"MP Elapsed time: {end-start:.2f}s")
Performance
It's very interesting how the creation of the random list impacts the performance of multiprocessing
Scenario 1:
randomlist = np.random.randint(1, 30000000, 100000000)
MP Elapsed time: 1.63s
Scenario 2:
randomlist = np.random.randint(1, 30000000, 100000000).tolist()
MP Elapsed time: 6.02s
Scenario 3
randomlist = [random.randint(1, 30000000) for _ in range(100000000)]
MP Elapsed time: 7.14s
Scenario 4:
randomlist = list(np.random.randint(1, 30000000, 100000000))
MP Elapsed time: 184.28s
Scenario 5:
randomlist = []
for _ in range(100000000):
n = random.randint(1, 30000000)
randomlist.append(n)
MP Elapsed time: 7.52s