Home > Back-end >  Memory efficient dot product between a sparse matrix and a non-sparse numpy matrix
Memory efficient dot product between a sparse matrix and a non-sparse numpy matrix

Time:12-22

I have gone through similar questions that has been asked before (for example [1] [2]). However, none of them completely relevant for my problem.

I am trying to calculate a dot product between two large matrices and I have some memory constraint that I have to meet.

I have a numpy sparse matrix, which is a shape of (10000,600000). For example,

from scipy import sparse as sps
x = sps.random(m=10000, n=600000, density=0.1).toarray()

The second numpy matrix is of size (600000, 256), which consists of only (-1, 1).

import numpy as np
y = np.random.choice([-1,1], size=(600000, 256))

I need dot product of x and y at lowest possible memory required. Speed is not the primary concern.

Here is what I have tried so far:

Scipy Sparse Format:

Naturally, I converted the numpy sparse matrix to scipy csr_matrix. However, task is still getting killed due to memory issue. There is no error, I just get killed on the terminal.

from scipy import sparse as sps
sparse_x = sps.csr_matrix(x, copy=False)
z = sparse_x.dot(y)
# killed

Decreasing dtype precision Scipy Sparse Format:

from scipy import sparse as sps

x = x.astype("float16", copy=False)
y = y.astype("int8", copy=False)

sparse_x = sps.csr_matrix(x, copy=False)
z = sparse_x.dot(y)
# Increases the memory requirement for some reason and dies 

np.einsum

Not sure if it helps/works with sparse matrix. Found something interesting in this answer. However, following doesn't help either:

z = np.einsum('ij,jk->ik', x, y)
# similar memory requirement as the scipy sparse dot

Suggestions?

If you have any suggestions to improve any of these. Please let me know. Further, I am thinking in the following directions:

  1. It would be great If I can get rid of dot product itself somehow. My second matrix (i.e. y is randomly generated and it just has [-1, 1]. I am hoping if there is way I could take advantage of its features.

  2. May be diving dot product into several small dot product and then, aggregate.

CodePudding user response:

The deepest python code for M@x, where M is a sparse csr matrix, and x is a dense array is:

In [28]: M._mul_multivector??
Signature: M._mul_multivector(other)
Docstring: <no docstring>
Source:   
    def _mul_multivector(self, other):
        M, N = self.shape
        n_vecs = other.shape[1]  # number of column vectors

        result = np.zeros((M, n_vecs),
                          dtype=upcast_char(self.dtype.char, other.dtype.char))

        # csr_matvecs or csc_matvecs
        fn = getattr(_sparsetools, self.format   '_matvecs')
        fn(M, N, n_vecs, self.indptr, self.indices, self.data,
           other.ravel(), result.ravel())

        return result

The fn here is compiled code. It gets the attribute arrays of the csr matrix, the dense array as a 1d array, and the preallocated result array - also as a flattened view.

I wonder the memory error occurs when creating result. Because I suspect fn uses that memory as its workspace, and does not try use any more memory.

I know that the sparse@sparse matrix multiplication is done in two compiled steps. The first determines the dimensions and nnz of the result, and the second actually does the calculation. The python code between the two steps creates the result array, much as is being done here. And memory errors with this @ occur in the result allocation.

CodePudding user response:

If you have created your arrays without any memory problem (that I had for the prepared example sizes), why not just using iterations. The dot product equivalent loops could be a solution, which can be accelerated with numba:

@nb.njit  # ("float64[:, ::1](float64[:, ::1], int32[:, ::1])")
def dot(a, b):

    dot_ = np.zeros((a.shape[0], b.shape[1]))
    for i in range(a.shape[1]):
        for j in range(a.shape[0]):
            for k in range(b.shape[1]):
                dot_[j, k]  = a[j, i] * b[i, k]
    return dot_

the result will easily fit in memory for your prepared example as Warren mentioned in the comments.

  • Related