I've been trying to optimize the construction of matrix C (see below) by using NumPy. How could my code be further optimized so as to make the building of matrix C faster?
Given the following matrixes:
Q: array([[78.66 , 47.196 , 31.464 ],
[40.3875, 24.2325, 16.155 ],
[40.4775, 24.2865, 16.191 ],
...,
[55.62 , 33.372 , 22.248 ],
[76.7475, 46.0485, 30.699 ],
[77.3325, 46.3995, 30.933 ]])
S: [[[1,2,3],[],[],[1,...,1125]],
[[],[1,...,200],[300,301][]],
...,
[[1,1125],[],[12],[345,453]]]
gamma: array([[0. , 1.4, 2.5, 3. , 3. ],
[0. , 1.6, 3. , 3.7, 4. ],
[0. , 1.8, 3.5, 4.4, 5. ]])
I have the following code to build the three dimensional matrix C
# # Matrix C_ijk
C = np.zeros((n,o,p))
for i in range(n):
for j in range(o):
for k in range(p):
for u in range(m-1):
if np.isin(i,S[j][u]):
C[i,j,k] = Q[j,k] * gamma[k,u 1]
Edit: m
,n
,o
and p
are integers which define the dimensional length of the matrixes. These are 5, 1126, 797 and 3 in the full model respectively.
Q is size (o,p)
;
S is size (o,m-1)
;
gamma is size (p,m-1)
;
C is size (n,o,p)
;
Here is a small example output:
>>> Q
array([[10., 10.],
[20., 20.],
[30., 30.],
[30., 30.]])
>>> S
[[[0, 1], [], [], [2]], [[2], [0], [1], []], [[], [1], [0, 2], []], [[], [2], [], [0, 1]]]
>>> gamma
array([[0. , 0.575, 1.2 , 1.75 , 2. ],
[0. , 0.625, 1.4 , 2.25 , 3. ]])
>>> C
array([[[ 5.75, 6.25],
[24. , 28. ],
[52.5 , 67.5 ],
[60. , 90. ]],
[[ 5.75, 6.25],
[35. , 45. ],
[36. , 42. ],
[60. , 90. ]],
[[20. , 30. ],
[11.5 , 12.5 ],
[52.5 , 67.5 ],
[36. , 42. ]]])
As suggested by @AhmedMohamedAEK would the implementation of numba in the following way be correct?
from numba import njit, prange
@njit(parallel=True)
def matrix_C(Q,S,gamma,n,o,p,m):
C = np.zeros((n,o,p))
for i in prange(n):
for j in prange(o):
for u in prange(m-1):
if np.isin(i,S[j][u]):
C[i,j,:] = Q[j,:] * gamma[:,u 1]
return C
C = matrix_C(Q,S,gamma,n,o,p,m)
CodePudding user response:
you can remove the loop in k since it's used by all of the arrays as follows:
# # Matrix C_ijk
C = np.zeros((n,o,p))
for i in range(n):
for j in range(o):
for u in range(m-1):
if np.isin(i,S[j][u]):
C[i,j,:] = Q[j,:] * gamma[:,u 1]
however using nested loops in python is very discouraged, and looping should be moved to the C side using an external module, which can be created using cython or numba.
edit: for the numba implementation above, if the array is very huge (a few MBs of size) then you could use a parallel implementation as follows:
from numba import njit, prange
@njit(parallel=True)
def matrix_C(Q,S,gamma,n,o,p,m):
C = np.zeros((n,o,p))
for i in prange(n):
for j in range(o):
for k in range(p):
for u in range(m-1):
if np.isin(i,S[j][u]):
C[i,j,k] = Q[j,k] * gamma[k,u 1]
return C
C = matrix_C(Q,S,gamma,n,o,p,m)
however if the array is relatively smaller then i'd just remove parallel and prange and just use the following:
@njit()
def matrix_C(Q,S,gamma,n,o,p,m):
C = np.zeros((n,o,p))
for i in range(n):
for j in range(o):
for k in range(p):
for u in range(m-1):
if np.isin(i,S[j][u]):
C[i,j,k] = Q[j,k] * gamma[k,u 1]
return C
C = matrix_C(Q,S,gamma,n,o,p,m)
and remember the first time you call the function is when it will be compiled, so it will be a little slow, but further calls will be fast.
CodePudding user response:
What's particularly hurting you - aside from the Python loops - is that linear time np.isin
lookup in your innermost loop. At the cost of a little more space, this issue is trivial to remedy. We can create a mask that stores whether i
is in S[j][u]
, so we do not need to search for it later on.
This is achieved by the following code, creating the mask array R
.
R = np.zeros((n, o, m))
for j in range(o):
for u in range(m - 1):
for i in S[j][u]:
R[i, j, u] = i < n
Notice further that your computation of the matrix C
only adds meaningful work when np.isin(i, S[j][u])
. We can determine such i, j, u
using np.argwhere(R)
, which should be a lot faster. This results in the following straightforward computation:
for i, j, u in np.argwhere(R):
C[i, j] = Q[j] * gamma[:, u 1]
Since all that work is done using numpy, this should be considerably faster than your initial implementation.
Full code:
def matrix_C(Q, S, gamma, n, o, p, m):
C = np.zeros((n, o, p))
R = np.zeros((n, o, m))
for j in range(o):
for u in range(m - 1):
for i in S[j][u]:
R[i, j, u] = i < n
for i, j, u in np.argwhere(R):
C[i, j] = Q[j] * gamma[:, u 1]
return C
Hint: this can be sped up further. Only the largest value of u
is used to create C[i, j]
- earlier entries due to lower values of u
are overwritten. Can you think of a way in which you can determine this value immediately as you build R
? How would that help?