I have noticed an apparent inconsistency in how SciPy sparse matrices and numpy arrays are modified when passed into functions. In particular, I was wondering if someone could explain why the a
sparse matrix below is not globally modified by func
, but the b
array is:
from scipy import sparse
import numpy as np
def func(m):
m = m
a = sparse.identity(2)
b = np.array([1, 2])
print(a.todense()) # [[1,0],[0,1]]
func(a)
print(a.todense()) # Still [[1,0],[0,1]]. Why???
print(b) # [1, 2]
func(b)
print(b) # Now [2, 4]
CodePudding user response:
In [11]: arr = np.array([[1,0],[2,3]])
In [12]: id(arr)
Out[12]: 1915221691344
In [13]: M = sparse.csr_matrix(arr)
In [14]: id(M)
Out[14]: 1915221319840
In [15]: arr = arr
In [16]: id(arr)
Out[16]: 1915221691344
=
operates in-place for array.
In [17]: M = M
In [18]: id(M)
Out[18]: 1915221323200
For the sparse matrix it creates a new sparse matrix object. It doesn't modify the matrix in-place.
For this operation, the data
attribute could be modified in place:
In [20]: M.data
Out[20]: array([2, 4, 6], dtype=int32)
In [21]: M.data = M.data
In [22]: M.A
Out[22]:
array([[ 4, 0],
[ 8, 12]], dtype=int32)
But in general, adding something to a sparse matrix can modify its sparsity. The sparse developers, in their wisdom, decided it wasn't possible, or just not cost effective (programming or run time?) to do this without creating a new matrix.
While a sparse matrix is patterned on the np.matrix
subclass, it is not a subclass of ndarray
, and is not obligated to behave in exactly the same way.
In [30]: type(M).__mro__
Out[30]:
(scipy.sparse.csr.csr_matrix,
scipy.sparse.compressed._cs_matrix,
scipy.sparse.data._data_matrix,
scipy.sparse.base.spmatrix,
scipy.sparse.data._minmax_mixin,
scipy.sparse._index.IndexMixin,
object)