Get column ID of element in loop-CodePudding

Im trying to create a function that will transform a regular Matrix into CSR form (I don't want to use the scipy.sparse one).

To do this, I'm using a nested for-loop to run through a given matrix to create a new matrix with three rows. The first row ('Values') should contain all non-zero values. The second ('Cols') should contain the column index for each number in 'Values'. The third row should contain the index value in 'Values' for the first non-zero value on each row.

My question regards the second and third rows: Is there a way of getting the column ID for the element 'i' in the for-loop?

M=array([[4,0,39],
        [0,5,0],
        [0,0,7]])

def Convert(x):
    
    CSRMatrix = []
    Values = []
    Cols = []
    Rows = []
    
    for k in x:
        
        for i in k:
        
            if i != 0:
                
                    Values.append(i)
                    Cols.append({#the column index value of 'i'})
                    Rows.append[#theindex in 'Values' of the first non-zero element on each row]
                    
    CSRMatrix.append(Values)
    CSRMatrix.append(Cols)
    CSRMatrix.append(Rows)
    
    return(CSRMatrix)

Convert(M)

CodePudding user response：

I'm not sure of what you want exactly for Cols.append() because of the way you commented it in the code between curly braces.

Is it a dict containing the index:value of all non 0 value? Or a list of sets containing the indexes of all non 0 values (which would be weird), or is it all the indexes of each row in your array?

Anyway I put the 2 most likely candidates (dict and list of indexes for each row) test each one and delete the unwanted one and if none are right please add some more specifics:

import numpy as np

m = np.array([[4,0,39],
     [0,5,0],
     [0,0,7]])

def Convert(x):
    
    CSRMatrix = []
    Values = []
    Cols = []
    Rows = []
    
    for num in x:
        for i in range(len(num)):
            if num[i] != 0:
                Values.append(num[i])
                Cols.append({i:num[i]}) # <- if dict. Remove if not what you wanted
                Rows.append(i)
            Cols.append(i) # <- list of all indexes in the array for each row. Remove if not what you wanted                    

    CSRMatrix.append(Values)
    CSRMatrix.append(Cols)
    CSRMatrix.append(Rows)
    return(CSRMatrix)

x = Convert(m)
print(x)

CodePudding user response：

enumerate() passes an index for every iteration. Thereby the second row can be easily created by appending num2. For the third row you have to check again if you have already added a value in that row. If not append num2 and set the non_zero check to False. For the next row non_zero check is set to True again.

def Convert(x):
 
     CSRMatrix = []
     Values = []
     Cols = []
     Rows = []
     for num, k in enumerate(x):
         non_zero = True
         for num2, i in enumerate(k):
             if i != 0:
 
                 Values.append(i)
                 Cols.append(num2)
                 if non_zero:
                     Rows.append(num2)
                     non_zero = False
 
     CSRMatrix.append(Values)
     CSRMatrix.append(Cols)
     CSRMatrix.append(Rows)
 
     return (CSRMatrix)

CodePudding user response：

Here is a numpythonic implementation, use the nonzero method to directly obtain the row and column index of non-zero elements, and then use a comparison to generate a mask, so as to obtain the column index of the first non-zero element in each row:

>>> M = np.array([[ 4,  0, 39],
...               [ 0,  5,  0],
...               [ 0,  0,  7]])
>>> r, c = M.nonzero()
>>> mask = np.concatenate(([True], r[1:] != r[:-1]))
>>> [M[r, c], r, c[mask]]
[array([ 4, 39,  5,  7]), array([0, 0, 1, 2], dtype=int64), array([0, 1, 2], dtype=int64)]

Test of a larger array:

>>> a = np.random.choice(10, size=(8, 8), p=[0.73]   [0.03] * 9)
>>> a
array([[0, 0, 0, 0, 8, 0, 0, 1],
       [1, 0, 5, 4, 0, 0, 9, 0],
       [0, 0, 9, 0, 0, 0, 0, 1],
       [0, 0, 0, 8, 9, 0, 0, 4],
       [0, 0, 5, 0, 0, 6, 0, 0],
       [0, 8, 0, 0, 0, 0, 0, 9],
       [0, 0, 0, 0, 0, 0, 0, 9],
       [0, 9, 0, 0, 0, 4, 0, 0]])
>>> r, c = a.nonzero()
>>> mask = np.concatenate(([True], r[1:] != r[:-1]))
>>> pp([a[r, c], r, c[mask]])
[array([8, 1, 1, 5, 4, 9, 9, 1, 8, 9, 4, 5, 6, 8, 9, 9, 9, 4]),
 array([0, 0, 1, 1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 6, 7, 7], dtype=int64),
 array([4, 0, 2, 3, 2, 1, 7, 1], dtype=int64)]