Numba parallel causing incorrect results in a for loop, I can't pinpoint the issue-CodePudding

So I have what appears to be a perfectly acceptable loop to make parallel. But when I pass it to Numba parallel, it always gives incorrect results. All that happens in the loop is an input matrix has one element set to 0, matrix multiplication occurs and populates a new matrix, then the element that was set to 0 is set back to its original value. It would appear the array a is getting modified in each dispatch of Numba, so I tried copying a to another variable inside the loop, modifying the copy only, yet obtain the same incorrect results (not shown). Here is a minimal example. I just don't see what the issue is, or how to fix it:

import numpy as np
from scipy.stats import random_correlation
import numba as nb

def myfunc(a, corr):
    b = np.zeros(a.shape[0])

    for i in range(b.shape[0]):
        temp = a[i]
        a[i] = 0
        b[i] = a@[email protected]
        a[i] = temp

    return b

@nb.njit(parallel=True)
def numbafunc(a, corr):
    b = np.zeros(a.shape[0])

    for i in nb.prange(b.shape[0]):
        temp = a[i]
        a[i] = 0
        b[i] = a@[email protected]
        a[i] = temp

    return b

if __name__ == '__main__':
    a = np.random.rand(10)
    corr = random_correlation.rvs(eigs=[2,2,1,1,1,1,0.5,0.5,0.5,0.5])

    b_1 = myfunc(a, corr)
    b_2 = numbafunc(a, corr)

    # check if serial and Numba results match off the same inputs
    print(np.isclose(b_1,b_2))

    # double check the original function returns the same results again..
    b_1_check = myfunc(a, corr)
    print(np.isclose(b_1, b_1_check))

Returns all false values, or at least 9/10 are false... Can anyone pinpoint which part of the code is problematic for parallelization? It looks fine to me. Much appreciated!

CodePudding user response：

There is a race condition in numbafunc. Indeed, a[i] = 0 modifies the array a shared between multiple threads reading/writing a for different i values. Storing the value in temp to restore it later only works in sequential, but not in parallel since threads can read a at any time.

To solve this issue, each thread should operate on its own copy of a:

@nb.njit(parallel=True)
def numbafunc(a, corr):
    b = np.zeros(a.shape[0])

    for i in nb.prange(b.shape[0]):
        c = a.copy()
        c[i] = 0.0
        b[i] = c @ corr @ c.T

    return b