Home > Enterprise >  multiprocessing ThreadPool module increases processing time as the number of threads increases
multiprocessing ThreadPool module increases processing time as the number of threads increases

Time:10-10

This is actually the continuation of my previous question: Reduce time complexity of nested loop

I want to implement multithreading to my Pearson's r algorithm. Here, I am dividing matrix X into n/t x n as well as vector y into n/t x 1 and feed each segment into the threads. I am currently using ThreadPool from multiprocessing library as an alternate to the Threading library of python. This is because during the testing I noticed that Threading does not really running the threads simultaneously; and upon further searching in the internet this is due to GIL (global interpreter lock). They suggested multiprocessing library and the code below is what I have done so far.

pearson-cor: This algorithm aims to solve Pearson's r where the function is accepting a matrix X, a vector y, and (m, n) sizes. You may see the code in the link above.

My concern here is that the processing time still increases as the number of threads increases (see also sample output below). It is noticeable from my test that the time decreases from thread = 2 to thread = 4 and starting to increase again from thread = 8 to 64. Although, they are still better than the single threaded one, I am expecting for them to decrease as more computation are being process at the same time.

I want to know why this is happening. I am a newbie at multiprocessing and I just started to learn this last week. Is there any mistakes or lacking from my implementation? Is this still related to GIL? If so, do you have any suggestions how to fix this to implement multithreading more efficiently?

n = int(input("n = "))
t = int(input("t = "))
correct_rs = []
y = np.random.randint(0,100, size = n)
X = np.random.randint(0,100,size = (n,n))
split_y = np.array_split(y, t)
split_x = []
for i in range(n):
    split_x.append(np.array_split(X[i], t))

for i in range(n):
    threads = [None] * t
    ret = []
    for j in range(t):
        pool = ThreadPool(processes=t)
        threads[j] = pool.apply_async(pearson_cor, args=(split_x[i][j], split_y[j]))
        ret.append(threads[j].get())
1 = 30.79026460647583
2 = 19.61565899848938
**4 = 22.66206121444702**
8 = 26.578197240829468
16 = 27.43799901008606
32 = 29.007505416870117
64 = 29.55176091194153

1 = 82.63879776000977
2 = 71.86883449554443
4 = 66.2829954624176
**8 = 72.7975389957428**
16 = 74.40859937667847
32 = 79.7437674999237
64 = 82.5101261138916

1 = 3.117418050765991
2 = 2.9685776233673096
4 = 2.442412853240967
**8 = 2.6580233573913574**
16 = 2.7630186080932617
32 = 2.747586727142334
64 = 2.768829345703125

CodePudding user response:

One thing to consider is CPU thrashing(https://stackoverflow.com/a/20842853/6014330). depending on various things the CPU will spend more time swapping that actually running the code.

another thing: If your CPU only has 8 threads available and you try to use 64 threads you're just taking 64 threads and switching between them 8 times. The most optimal you can get is to match the amount of threads your CPU has. Any more than that is just waste.

  • Related