Optimize the use of numpy for creating a 3d-matrix-CodePudding

I've been trying to optimize the construction of matrix C (see below) by using NumPy. How could my code be further optimized so as to make the building of matrix C faster?

Given the following matrixes:

Q:   array([[78.66  , 47.196 , 31.464 ],
           [40.3875, 24.2325, 16.155 ],
           [40.4775, 24.2865, 16.191 ],
           ...,
           [55.62  , 33.372 , 22.248 ],
           [76.7475, 46.0485, 30.699 ],
           [77.3325, 46.3995, 30.933 ]])

S:  [[[1,2,3],[],[],[1,...,1125]],
     [[],[1,...,200],[300,301][]],
     ...,
     [[1,1125],[],[12],[345,453]]]

gamma:   array([[0. , 1.4, 2.5, 3. , 3. ],
               [0. , 1.6, 3. , 3.7, 4. ],
               [0. , 1.8, 3.5, 4.4, 5. ]])

I have the following code to build the three dimensional matrix C

# # Matrix C_ijk 
C = np.zeros((n,o,p))
for i in range(n):
    for j in range(o):
        for k in range(p):
            for u in range(m-1):
                if np.isin(i,S[j][u]):
                    C[i,j,k] = Q[j,k] * gamma[k,u 1]

Edit: m,n,o and p are integers which define the dimensional length of the matrixes. These are 5, 1126, 797 and 3 in the full model respectively.

Q is size (o,p) ; S is size (o,m-1) ; gamma is size (p,m-1) ; C is size (n,o,p) ;

Here is a small example output:

>>> Q
array([[10., 10.],
       [20., 20.],
       [30., 30.],
       [30., 30.]])
>>> S
[[[0, 1], [], [], [2]], [[2], [0], [1], []], [[], [1], [0, 2], []], [[], [2], [], [0, 1]]]
>>> gamma
array([[0.   , 0.575, 1.2  , 1.75 , 2.   ],
       [0.   , 0.625, 1.4  , 2.25 , 3.   ]])
>>> C
array([[[ 5.75,  6.25],
        [24.  , 28.  ],
        [52.5 , 67.5 ],
        [60.  , 90.  ]],

       [[ 5.75,  6.25],
        [35.  , 45.  ],
        [36.  , 42.  ],
        [60.  , 90.  ]],

       [[20.  , 30.  ],
        [11.5 , 12.5 ],
        [52.5 , 67.5 ],
        [36.  , 42.  ]]])

As suggested by @AhmedMohamedAEK would the implementation of numba in the following way be correct?

from numba import njit, prange

@njit(parallel=True)
def matrix_C(Q,S,gamma,n,o,p,m):
    C = np.zeros((n,o,p))
    for i in prange(n):
        for j in prange(o):
            for u in prange(m-1):
                if np.isin(i,S[j][u]):
                    C[i,j,:] = Q[j,:] * gamma[:,u 1]
    return C
C = matrix_C(Q,S,gamma,n,o,p,m)

CodePudding user response：

you can remove the loop in k since it's used by all of the arrays as follows:

# # Matrix C_ijk 
C = np.zeros((n,o,p))
for i in range(n):
    for j in range(o):
        for u in range(m-1):
            if np.isin(i,S[j][u]):
                C[i,j,:] = Q[j,:] * gamma[:,u 1]

however using nested loops in python is very discouraged, and looping should be moved to the C side using an external module, which can be created using cython or numba.

edit: for the numba implementation above, if the array is very huge (a few MBs of size) then you could use a parallel implementation as follows:

from numba import njit, prange

@njit(parallel=True)
def matrix_C(Q,S,gamma,n,o,p,m):
    C = np.zeros((n,o,p))
    for i in prange(n):
        for j in range(o):
            for k in range(p):
                for u in range(m-1):
                    if np.isin(i,S[j][u]):
                        C[i,j,k] = Q[j,k] * gamma[k,u 1]
    return C
C = matrix_C(Q,S,gamma,n,o,p,m)

however if the array is relatively smaller then i'd just remove parallel and prange and just use the following:

@njit()
def matrix_C(Q,S,gamma,n,o,p,m):
    C = np.zeros((n,o,p))
    for i in range(n):
        for j in range(o):
            for k in range(p):
                for u in range(m-1):
                    if np.isin(i,S[j][u]):
                        C[i,j,k] = Q[j,k] * gamma[k,u 1]
    return C
C = matrix_C(Q,S,gamma,n,o,p,m)

and remember the first time you call the function is when it will be compiled, so it will be a little slow, but further calls will be fast.

CodePudding user response：

What's particularly hurting you - aside from the Python loops - is that linear time np.isin lookup in your innermost loop. At the cost of a little more space, this issue is trivial to remedy. We can create a mask that stores whether i is in S[j][u], so we do not need to search for it later on.

This is achieved by the following code, creating the mask array R.

R = np.zeros((n, o, m))

for j in range(o):
    for u in range(m - 1):
        for i in S[j][u]:
            R[i, j, u] = i < n

Notice further that your computation of the matrix C only adds meaningful work when np.isin(i, S[j][u]). We can determine such i, j, u using np.argwhere(R), which should be a lot faster. This results in the following straightforward computation:

for i, j, u in np.argwhere(R):
    C[i, j] = Q[j] * gamma[:, u   1]

Since all that work is done using numpy, this should be considerably faster than your initial implementation.

Full code:

def matrix_C(Q, S, gamma, n, o, p, m):
    C = np.zeros((n, o, p))
    R = np.zeros((n, o, m))

    for j in range(o):
        for u in range(m - 1):
            for i in S[j][u]:
                R[i, j, u] = i < n

    for i, j, u in np.argwhere(R):
        C[i, j] = Q[j] * gamma[:, u   1]

    return C

Hint: this can be sped up further. Only the largest value of u is used to create C[i, j] - earlier entries due to lower values of u are overwritten. Can you think of a way in which you can determine this value immediately as you build R? How would that help?