Python Quadruple For Performance-CodePudding

I have code that reads data and grabs specific data from an object's fields. How can I eliminate the quadruple for loop here? Its performance seems quite slow.

data = readnek(filename) # read in data
bigNum=200000
for myNodeVal in range(0, 7):  # all 6 elements. 
    cs_coords = np.ones((bigNum, 2)) # initialize data
    counter = 0
    for iel in range(bigNum):
        for ix in range(0,7):
            for iy in range(0,7):
                z = data.elem[iel].pos[2, myNodeVal, iy, ix]  
                x = data.elem[iel].pos[0, myNodeVal, iy, ix] 
                y = data.elem[iel].pos[1, myNodeVal, iy, ix]
                cs_coords[counter, 0:2] = [x, y]
                counter  = 1

CodePudding user response：

You can remove the two innermost loops using a transposed view that is reshaped so to build a block of 49 [x, y] values then assigned to cs_coords in a vectorized way. The access to z can be removed for better performance (since the Python interpreter optimize nearly nothing). Here is an (untested) example:

data = readnek(filename) # read in data
bigNum=200000
for myNodeVal in range(0, 7):  # all 6 elements. 
    cs_coords = np.ones((bigNum, 2)) # initialize data
    counter = 0
    for iel in range(bigNum):
        arr = data.elem[iel].pos
        view_x = arr[0, myNodeVal, 0:7, 0:7].T
        view_y = arr[1, myNodeVal, 0:7, 0:7].T
        cs_coords[counter:counter 49] = np.hstack([view_x.reshape(-1, 1), view_y.reshape(-1, 1)])
        counter  = 49

Note that the initial code is probably flawed since cs_coords.shape[0] is bigNum and counter will be bigNum * 49. You certainly need to use the shape (bigNum*49, 2) instead so to avoid out of bound errors.

Note the above code is still far from being optimal since it will create many small arrays and Numpy is not optimized to deal with very small arrays (CPython neither). It is hard to do much better without more information on data. Using Numba or Cython can certainly help a lot to speed up this code. Still, even with such tool, the code will not be very efficient since the memory access pattern is inefficient (bad cache locality) and the overall code will be memory-bound.