Home > Back-end >  Why isn't eliminate_zeros() removing the zero entries?
Why isn't eliminate_zeros() removing the zero entries?

Time:11-05

Code

import numpy as np
from scipy.sparse import csr_matrix

arr = np.array([[0,0,0], [0,0,1], [1,0,1]])
mat = csr_matrix(arr)
mat.eliminate_zeros()
print(mat.toarray())

Output

[[0 0 0]
[0 0 1]
[1 0 2]]

According to the documentation, this method removes the zero entries from the matrix. However, why are there still zeros?

From this website, I've gathered the following:

eliminate_zeros removes all zeros in your matrix from the sparsity pattern (ie. there is no value stored for that position, when before there was a vlaue stored, but it was 0).

I can still access those zero entries.

print(mat[0, 0])

CodePudding user response:

The documentation should probably be more explicit. eliminate_zeros doesn't affect the logical contents of a sparse matrix at all.

eliminate_zeros changes the underlying representation of a sparse matrix without affecting its logical contents. It removes explicitly stored zeros from the data array backing the sparse matrix. It's used to reduce space consumption, and to prepare a sparse matrix for algorithms that assume there will be no explicitly stored zeros.

It does not remove logical zeros from the sparse matrix. That wouldn't be possible - you can't have a sparse matrix with a bunch of data-less holes in it. It's not like a masked array.

CodePudding user response:

To complement the other answer, I'll show the underlying data storage of your sparse matrix.

In [147]: from scipy import sparse
In [148]: arr = np.array([[0,0,0], [0,0,1], [1,0,1]])

The coo format is easiest to understand

In [149]: M = sparse.coo_matrix(arr)
In [150]: M
Out[150]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 3 stored elements in COOrdinate format>
In [151]: print(M)
  (1, 2)    1
  (2, 0)    1
  (2, 2)    1

The values are actually stored in 3 arrays:

In [152]: M.data,M.row,M.col
Out[152]: 
(array([1, 1, 1]),
 array([1, 2, 2], dtype=int32),
 array([2, 0, 2], dtype=int32))

csr format changes the row/col arrays:

In [153]: Mr = M.tocsr()
In [154]: Mr.data, Mr.indices, Mr.indptr
Out[154]: 
(array([1, 1, 1]),
 array([2, 0, 2], dtype=int32),
 array([0, 0, 1, 3], dtype=int32))

Now let's change one element of the data array:

In [155]: Mr.data[1] = 0
In [156]: Mr.data
Out[156]: array([1, 0, 1])

eliminate_zeros finds that 0, and removes it from the data structure:

In [157]: Mr.eliminate_zeros()
In [158]: Mr.data
Out[158]: array([1, 1])
In [159]: Mr.indices
Out[159]: array([2, 2], dtype=int32)
In [160]: Mr.A
Out[160]: 
array([[0, 0, 0],
       [0, 0, 1],
       [0, 0, 1]])
In [161]: print(Mr)       # show the coo style values
  (1, 2)    1
  (2, 2)    1

Changing the indices and indptr of a csr (changing the "the sparsity pattern") is more work than simply assigning 0 to the data. So the csr format lets you make a bunch of changes to data, and cleaning up afterwards.

Anyways, this eliminate_zeros is not something a beginning user is likely to need.

  • Related