Writing long code as nested list comprehension-CodePudding

I am writing a code IN Python to compute the discrete Laplacian as a sparse matrix in 2D. The code is as follows:

def discretizeLaplacian(N, step):
    NN = N**2
    dxx = (float) (1/((step)**2))
    i_loc, j_loc, vals = np.array([]), np.array([]), np.array([])
    for i in range(1, N-1):
        for j in range(1, N-1):      
            iv0 = j*N   i
            iv1 = (j - 1)*N   i
            iv2 = (j   1)*N   i
            iv3 = j*N   i - 1
            iv4 = j*N   i   1
            i_loc = np.concatenate((i_loc,[iv0, iv0, iv0, iv0, iv0]))
            j_loc = np.concatenate((j_loc,[iv0, iv1, iv2, iv3, iv4]))
            vals = np.concatenate((vals,dxx*np.array([-4, 1, 1, 1, 1])))
    sparseMatrix = csc_matrix((vals, (i_loc, j_loc)), shape = (NN, NN))
    return sparseMatrix

The for loop runs much longer compared to Matlab. I am wondering if I can write it as nested list comprehension to make it faster.

CodePudding user response：

The slow performance comes from the bad complexity of the algorithm. Indeed, the complexity of the original code is O(N**4)! This is due to np.concatenate which creates a new array by copying the old one and adding a few items at the end of the new one. This means that O(N**2) copies of 3 growing arrays are performed. In general, you should avoid np.concatenate in a loop to make a growing array. You should use Python lists in that case.

Note that you can use np.tile to repeat values of an array and pre-compute the constant dxx * np.array([-4, 1, 1, 1, 1]).

Here is the corrected code:

def discretizeLaplacian(N, step):
    NN = N**2
    dxx = (float) (1/((step)**2))
    tmp = dxx * np.array([-4, 1, 1, 1, 1])
    i_loc, j_loc = [], []
    for i in range(1, N-1):
        for j in range(1, N-1):   
            iv0 = j*N   I
            iv1 = (j - 1)*N   i
            iv2 = (j   1)*N   i
            iv3 = j*N   i - 1
            iv4 = j*N   i   1
            i_loc.extend([iv0, iv0, iv0, iv0, iv0])
            j_loc.extend([iv0, iv1, iv2, iv3, iv4])
    i_loc = np.array(i_loc, dtype=np.float64)
    j_loc = np.array(j_loc, dtype=np.float64)
    vals = np.tile(tmp, (N-2)**2)
    sparseMatrix = csc_matrix((vals, (i_loc, j_loc)), shape = (NN, NN))
    return sparseMatrix

Here are performance results on my machine with N=200:

Original code:   22.510 s
Corrected code:   0.068 s

While you can use list comprehensions to set i_loc with [j*N I for i in range(1, N-1) for j in range(1, N-1)], it is not easy for j_loc due to the multiple items to append. You can build a list of list/tuple and then flatten it, but it makes the code less readable and not faster.

If this is not fast enough, you can use Numpy vectorized functions (such as np.meshgrid, np.arange, and basic math operations) to build i_loc and j_loc. Please consider reading the Numpy tutorials about that. Python loops are generally very slow since Python codes are generally interpreted. Matlab uses internally a just-in-time compiler (JIT) to execute such a loop-based code quickly. You can use Numba or Cython to do the same thing in Python.