Home > Software engineering >  Retrieving values of a CSR matrix
Retrieving values of a CSR matrix

Time:12-02

I have a CSR matrix, and I want to be able to retrieve the column indices and the values.

Here is how I create the matrix (using csr_matrix from scipy.sparse):

indptr = np.empty(nbr_of_rows   1)  # nbr_of_rows = 134,465
indptr[0] = 0
for i in range(1, len(indptr)):
    indptr[i] = indptr[i-1]   len(data[i-1])  # type(data) = list ; len(data) = 134,465 ; type(data[0]) = numpy.darray (each subarray has a different length)
data = np.concatenate(data).ravel()  # now I have type(data) = numpy.darray ; len(data) = 2,821,574
ind = np.concatenante(ind).ravel  # same than above

X = csr_matrix((data, ind, indptr), shape=(nbr_of_rows, nbr_of_columns))  # nbr_of_columns = 3,991
print(f"The matrix has a shape of {X.shape} and a sparsity of {(1 - (X.nnz / (X.shape[0] * X.shape[1]))): .2%}.")
# OUT: The matrix has a shape of (134465, 3991) and a sparsity of 99.47%.

So far so good (at least I think so). But now, even though I manage to retrieve the column indices, I can’t successfully retrieve the values:

np.alltrue(ind == X.nonzero()[1])  # True
np.alltrue(data == X[X.nonzero()])  # False

When I look deeper, I find that I get almost all the values (only a small amount of mistakes):

len(data) == len(X[X.nonzero()].tolist()[0])  # True
len(np.argwhere((data==X[X.nonzero()]) == False))  # 2184

So I get "only" 2,184 wrong values out of 2,821,574 total values.

Can someone please help me in getting all the correct values from my CSR matrix?

CodePudding user response:

Depending on the type of the values you store in the matrix, numpy.float64 or numpy.int64, perhaps, the following post might answer your question: https://github.com/scipy/scipy/issues/13329#issuecomment-753541268

In particular, the comment "Apparently I don't get an error when data is a numpy array rather than a list." suggests that having data as numpy.array rather than a list could solve your problem.

Hopefully, this at least sets you on the right track.

CodePudding user response:

Without your data I can't replicate your problem, and probably wouldn't want to do so even with such a large array.

But I'll try to illustrate what I expect to happen when constructing a matrix this way. From another question I have a small matrix in a Ipython session:

In [60]: Mx
Out[60]: 
<1x3 sparse matrix of type '<class 'numpy.intc'>'
    with 2 stored elements in Compressed Sparse Row format>
In [61]: Mx.A
Out[61]: array([[0, 1, 2]], dtype=int32)

nonzero returns the coo format indices, row, col

In [62]: Mx.nonzero()
Out[62]: (array([0, 0], dtype=int32), array([1, 2], dtype=int32))

The csr attributes are:

In [63]: Mx.data,Mx.indices,Mx.indptr
Out[63]: 
(array([1, 2], dtype=int32),
 array([1, 2], dtype=int32),
 array([0, 2], dtype=int32))

Now lets make a new matrix, using the attributes of Mx. Assuming you constructed your indptr, indices, and data correctly this should imitate what you've done:

In [64]: newM = sparse.csr_matrix((Mx.data, Mx.indices, Mx.indptr))    
In [65]: newM.A
Out[65]: array([[0, 1, 2]], dtype=int32)

data matches between the two matrices:

In [68]: Mx.data==newM.data
Out[68]: array([ True,  True])

id of the data don't match, but their bases do. See my recent answer to see why this is relevant

https://stackoverflow.com/a/74543855/901925

In [75]: id(Mx.data.base), id(newM.data.base)
Out[75]: (2255407394864, 2255407394864)

That means changes to newA will appear in Mx:

In [77]: newM[0,1] = 100
In [78]: newM.A
Out[78]: array([[  0, 100,   2]], dtype=int32)
In [79]: Mx.A
Out[79]: array([[  0, 100,   2]], dtype=int32)

fuller test

Let's try a small scale test of your code:

In [92]: data = np.array([[1.23,2],[3],[]],object); ind = np.array([[1,2],[3],[]],object)
    ...: indptr = np.empty(4)  
    ...: indptr[0] = 0
    ...: for i in range(1, 4):
    ...:     indptr[i] = indptr[i-1]   len(data[i-1])
    ...: data = np.concatenate(data).ravel()    
    ...: ind = np.concatenate(ind).ravel()  # same than above

In [93]: data,ind,indptr
Out[93]: (array([1.23, 2.  , 3.  ]), array([1., 2., 3.]), array([0., 2., 3., 3.]))

And the sparse matrix:

In [94]: X = sparse.csr_matrix((data, ind, indptr), shape=(3,3))    
In [95]: X
Out[95]: 
<3x3 sparse matrix of type '<class 'numpy.float64'>'
    with 3 stored elements in Compressed Sparse Row format>

data matches:

In [96]: X.data
Out[96]: array([1.23, 2.  , 3.  ])

In [97]: data == X.data
Out[97]: array([ True,  True,  True])

and is infact a view:

In [98]: data[1] =.23; data
Out[98]: array([1.23, 2.23, 3.  ])    
In [99]: X.A
Out[99]: 
array([[0.  , 1.23, 2.23],
       [0.  , 0.  , 0.  ],
       [3.  , 0.  , 0.  ]])

oops

I made an error in specifying the X shape:

In [110]: X = sparse.csr_matrix((data, ind, indptr), shape=(3,4))

In [111]: X.A
Out[111]: 
array([[0.  , 1.23, 2.23, 0.  ],
       [0.  , 0.  , 0.  , 3.  ],
       [0.  , 0.  , 0.  , 0.  ]])

In [112]: X.data
Out[112]: array([1.23, 2.23, 3.  ])

In [113]: X.nonzero()
Out[113]: (array([0, 0, 1], dtype=int32), array([1, 2, 3], dtype=int32))

In [114]: X[X.nonzero()]
Out[114]: matrix([[1.23, 2.23, 3.  ]])

In [115]: data
Out[115]: array([1.23, 2.23, 3.  ])

In [116]: data == X[X.nonzero()]
Out[116]: matrix([[ True,  True,  True]])
  • Related