Appending to a numpy array or list inside a for loop- which is prefferable?-CodePudding

So I'm working on some code which looks like the following:

import numpy as np

A=np.array([[1,0,3,5,7],[4,0,6,2,3]])

def SMD(matrix):
if isinstance(matrix,np.ndarray)==False:
    raise ValueError('The needed datatype is an array')
else:
    m= matrix.shape[0]
    n= matrix.shape[1]
    a=np.array([])
    b=np.array([0])
    c=np.array([])
    for i in range(m):
        for j in range(n):
            if matrix[i][j] !=0:            
                np.append(a,matrix[i][j])
                np.append(c,j)
        np.append(b,len(a))
    return a,b,c

However, the numpy append does not work for me in this case. If I instead use lists instead of arrays, the code runs just fine:

def SMD(matrix):
if isinstance(matrix,np.ndarray)==False:
    raise ValueError('The needed datatype is an array')
else:
    m= matrix.shape[0]
    n= matrix.shape[1]
    d=[]
    e=[0]
    f=[]
    for i in range(m):
        for j in range(n):
            if matrix[i][j] !=0:  
                d.append(matrix[i][j])
                f.append(j)
        e.append(len(d))
    return d,e,f

the wanted output is:

[1, 3, 5, 7, 4, 6, 2, 3], [0, 4, 8], [0, 2, 3, 4, 0, 2, 3, 4]

Or as arrays (depending on the code used).

Of course, I would like to know why the first code is not working.

From my knowledge, it can be preferable in terms of computation speed to use arrays, but in this case, does it make a difference?

Thanks

CodePudding user response：

np.append returns the appended array, it does not do in-place append. so you will have to save the returned values.

Fixed Code:

import numpy as np

A=np.array([[1,0,3,5,7],[4,0,6,2,3]])

def SMD(matrix):
  if isinstance(matrix,np.ndarray)==False:
      raise ValueError('The needed datatype is an array')
  else:
      m= matrix.shape[0]
      n= matrix.shape[1]
      a=np.array([])
      b=np.array([0])
      c=np.array([])
      for i in range(m):
          for j in range(n):
              if matrix[i][j] !=0:            
                  a = np.append(a,matrix[i][j])
                  c = np.append(c,j)
          b = np.append(b,len(a))
      return a,b,c

Optimised code

Since you are using numpy you can as well vectorize it and avoid loops:

a = A[A!=0]
b = np.pad(np.cumsum(np.sum(A!=0,axis=1)), (1,0))
c = np.argwhere(A!=0)[:, 1]

print (a,b,c)

Output:

[1 3 5 7 4 6 2 3] [0 4 8] [0 2 3 4 0 2 3 4]

CodePudding user response：

In terms of efficiency you you should avoid loops

def SMD(matrix):
    bool_matrix = (matrix!=0)
    return (
        matrix[bool_matrix],
        np.append(0, bool_matrix.sum(1).cumsum()),
        np.where(bool_matrix)[1]
    )

SMD(A)
#(array([1, 3, 5, 7, 4, 6, 2, 3]),
# array([0, 4, 8]),
# array([0, 2, 3, 4, 0, 2, 3, 4]))

matrix[bool_matrix] are simply all the non-zero elements of matrix

np.append(0, bool_matrix.sum(1).cumsum()) first compute the number of non-zero elements in the rows of matrix; then it compute the cumulative sum (from the first row to the last one); finally it add the 0 at the beginning of the array.

np.where(bool_matrix)[1] tells you the indices of the columns in which the elements of matrix are non-zero.

CodePudding user response：

So the reason your code doesn't work is that you are not assigning the output of the np.append operations back to a,b,c. Below is a version of your code that works:

def SMD(matrix):
    if isinstance(matrix,np.ndarray)==False:
        raise ValueError('The needed datatype is an array')
    else:
        m= matrix.shape[0]
        n= matrix.shape[1]
        a=np.array([])
        b=np.array([0])
        c=np.array([])
        for i in range(m):
            for j in range(n):
                if matrix[i,j] !=0:
                    a = np.append(a,matrix[i,j])
                    c = np.append(c,j)
            b = np.append(b,len(a))
        return a,b,c

In terms of efficiency, it would probably be better to initialise a,b,c upfront with the correct sizes (using e.g. np.empty) and then assign the values in the loop.