Scipy find minimum nonzero element of a sparse matrix for each row-CodePudding

I am trying to find the both location and the value of the minimum element of a sparse matrix for each row. A toy example for the question is given below: Here, we have a 3x6 sparse matrix "M".

H = np.array([[1, 2, 3, 0, 4, 0 ,0],
              [0, 5, 0, 6, 0, 0 ,0],
              [0, 0, 0, 7, 0, 0 ,8], dtype = np.float32)
M = scipy.sparse.csr_matrix(H)

Then, what I would like to obtain is the nonzero minimum elements of each row. For the example above:

min_elements = some_function(M,axis = 0)

and receiving the return as min_elements = [1,5,7]. The method M.min(axis=0) does not work for my case since the minimum element of each row is zero, therefore, returning an all-zeros array.

Thus, is there any efficient way of implementing such a function in an computationally efficient way using sparse matrix. In my general case, the sparse matrices will be quite huge and requires lots of additional computation. For this reason, the performance/speed is the main benchmark for me.

Thank you!

CodePudding user response：

In [333]: from scipy import sparse

In [334]: M = sparse.csr_matrix(H)    
In [335]: M
Out[335]: 
<3x7 sparse matrix of type '<class 'numpy.float32'>'
    with 8 stored elements in Compressed Sparse Row format>

M is stored as:

In [336]: M.indptr
Out[336]: array([0, 4, 6, 8], dtype=int32)    
In [337]: M.data
Out[337]: array([1., 2., 3., 4., 5., 6., 7., 8.], dtype=float32)    
In [338]: M.indices
Out[338]: array([0, 1, 2, 4, 1, 3, 3, 6], dtype=int32)

We can iterate on the slices defined by indptr, and take the min:

In [340]: for i in range(M.shape[0]):
     ...:     sl = slice(M.indptr[i],M.indptr[i 1])
     ...:     x, y = M.data[sl], M.indices[sl]
     ...:     m = np.argmin(x)
     ...:     print(y[m], x[m])
     ...:     
0 1.0
1 5.0
3 7.0

This can be streamlined a bit, but it gives the basic idea.

It may be easier to picture what's going on in the lil format:

In [341]: Ml = M.tolil()

In [342]: Ml.data
Out[342]: 
array([list([1.0, 2.0, 3.0, 4.0]), list([5.0, 6.0]), list([7.0, 8.0])],
      dtype=object)
In [343]: Ml.rows
Out[343]: array([list([0, 1, 2, 4]), list([1, 3]), list([3, 6])], dtype=object)

In [344]: for d,r in zip(Ml.data, Ml.rows):
     ...:     m = np.argmin(d)
     ...:     print(r[m], d[m])
     ...:     
0 1.0
1 5.0
3 7.0

Previous SO have asked for things like the smallest (or largest) N values by row.

Sparse is best for things that can be expressed as some sort of matrix multiplication. That includes row (or column) sums. Even csr indexing is done with matrix multiplication. Other row-by-row operations aren't as easy.

CodePudding user response：

What about replacing 0 with inf and then finding min:

from scipy import sparse
import numpy as np
H[H == 0] = np.inf
M = sparse.csr_matrix(H)
R = M.min(axis=1)
R.toarray()

Output:

array([[1.],
       [5.],
       [7.]], dtype=float32)

CodePudding user response：

You could flip all your data and find the maximum. This is assuming all your data is positive, as in the example.

M_inv = M.copy()
M_inv.data = 1/M.data
one_over_min_M = M_inv.max(axis=1)

min_M = 1/one_over_min_M.to_array()

On your example I get the output

[[1.       ]
 [5.       ]
 [6.9999995]]

There is some horrible numerical error there, but if you're happy to round your answer...

Edit: This approach might be redeemed if you're after the indices and want to do M_inv.argmax(axis=1), otherwise it's probably not the best.