Home > Net >  scipy matrix conversion documentation is unclear
scipy matrix conversion documentation is unclear

Time:11-07

Im converting a dok to a coo matrix in scipy and the documentation seems unclear to me. My goal is to not destroy the original matrix! The documentation states:

Convert this matrix to Compressed Sparse Row format. With copy=False, the data/indices may be shared between this matrix and the resultant csr_matrix.

However it seems to output the matrix rather than convert the original one. I thought "copy" might change the behaviour from converting to creating a copy but testing shows that can't be true. My programm has a long runtime so i don't want to destroy the matrix accidentially right before exporting it :)

mat = scipy.sparse.dok_matrix((10,10),dtype=np.int16)
type(mat)

<class 'scipy.sparse.dok.dok_matrix'>

coo = mat.tocoo(copy=True)
print(type(mat))

<class 'scipy.sparse.dok.dok_matrix'>

print(type(coo))

<class 'scipy.sparse.coo.coo_matrix'>

coo = mat.tocoo(copy=False)
print(type(mat))

<class 'scipy.sparse.dok.dok_matrix'>

print(type(coo))

<class 'scipy.sparse.coo.coo_matrix'>

Thanks!

CodePudding user response:

You shouldn't have to worry about this, your dok.tocoo will always be a copy. Also, methods like this always return a new matrix; they don't operate in-place. The sharing applies to the underlying data structures that store values and indices, any only if they are similar enough.

The underlying data structure for a dok is a dict, for coo 3 numpy arrays. There's no way to make the conversion without copying data.

They are being sloppy with the documentation here, just copying it from the generic template. The copy parameter is relevant when doing a "same kind conversion", e.g. dok.todok() or a coo.tocoo(). But application across formats will nearly always be a copy - I use 'nearly' because I'm unsure about a few like csr.tocsc.

If you are writing a function that takes a sparse matrix of any format, and needs to ensure it is, say, coo, you might want to use

 M1 = M.tocoo(copy=True)

to ensure that any changes of M1 won't appear in M, even if M was already coo.


Your quote was from a tocsr. The actual code for the dok version is

self.tocoo(copy=copy).tocsr(copy=False)

the dok is first converted to common coo format, and from there to csr (or one of the others).

The code for dok.tocoo is:

def tocoo(self, copy=False):
    from .coo import coo_matrix
    if self.nnz == 0:
        return coo_matrix(self.shape, dtype=self.dtype)

    idx_dtype = get_index_dtype(maxval=max(self.shape))
    data = np.fromiter(self.values(), dtype=self.dtype, count=self.nnz)
    row = np.fromiter((i for i, _ in self.keys()), dtype=idx_dtype, count=self.nnz)
    col = np.fromiter((j for _, j in self.keys()), dtype=idx_dtype, count=self.nnz)
    A = coo_matrix((data, (row, col)), shape=self.shape, dtype=self.dtype)
    A.has_canonical_format = True
    return A
  • Related