How to handle large matrices and matrix multiplication in Python-CodePudding

I'm trying to execute a matrix multiplication which has the following scheme:

C = np.dot(np.dot(sparse.csr_matrix(np.double(A).transpose()),sparse.spdiags(B,0,Ngrid,Ngrid)), sparse.csr_matrix(np.double(A)))

Thus, I want to transpose matrix A, which lead to a N x M matrix with M>>N and multiply with the diagonal matrix which is a M x M matrix. B is the „main diagonal“. The resulting matrix (N x M) should be multiplied with matrix A (M x N) and lead to the N x N matrix C.

The error appears is the following:

<2000x921600 sparse matrix of type '<class 'numpy.float64'>'
    with 1843066024 stored elements in Compressed Sparse Row format>

As the final matrix is N x N, I want to have this matrix as a numpy array. How you see, I try to make the matrix in the mid as a sparsity diagonal matrix which works well. But, I cant understand why Python need this insane large matrix with 1843066024 elements to conduct the multiplication.

Do you have some ideas and/or explanation why this problem appears?

CodePudding user response：

You're doing this... overly complicated. Here's a straightforward path for M >> N (you're inconsistent on that).

import numpy as np

B = sparse.spdiags(B,0,Ngrid,Ngrid)) # [M x M] sparse
A = np.ndarray(..., dtype=float)     # [M x N] dense

C = A.T @ B   # [N x M] dense
C = C @ A     # [N x N] dense

C is then the array that you want. None of your intermediate products are M x M. If you still have memory problems, you either need to get more memory, or you need to chunk your problem into smaller pieces on your m axis and calculate them piecewise.

CodePudding user response：

If B is diagonal, you don't need to use sparse to save memory. You can do the calculation with a 1d array, just the diagonal values.

I'll demonstrate with small dimensions, where making a full B is doesn't cause problems. Others can test this with large dimensions.

In [5]: A = np.arange(24).reshape(3,8)
In [7]: B = np.diag(np.arange(8))
In [8]: B.shape
Out[8]: (8, 8)

The double matmul:

In [10]: A@[email protected]
Out[10]: 
array([[  784,  1904,  3024],
       [ 1904,  4816,  7728],
       [ 3024,  7728, 12432]])

The equivalent einsum:

In [12]: np.einsum('ij,jk,lk',A,B,A)
Out[12]: 
array([[  784,  1904,  3024],
       [ 1904,  4816,  7728],
       [ 3024,  7728, 12432]])

The 1d diagonal of B:

In [15]: b = np.diag(B)

A broadcasted multiplication does the same thing as the matmul:

In [17]: np.allclose(A*b,A@B)
Out[17]: True
In [18]: np.allclose(A*[email protected],A@[email protected])
Out[18]: True

Expressing that with einsum:

In [19]: np.einsum('ij,j,lj',A,b,A)
Out[19]: 
array([[  784,  1904,  3024],
       [ 1904,  4816,  7728],
       [ 3024,  7728, 12432]])

Some comparative times:

In [20]: timeit np.einsum('ij,j,lj',A,b,A)
10.6 µs ± 22.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [21]: timeit np.einsum('ij,jk,lk',A,B,A)
15.2 µs ± 13.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [22]: timeit A@[email protected]
7.26 µs ± 20.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [23]: timeit A*[email protected]
8.17 µs ± 12.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

For einsum the 'condensed' version is faster. With matmul the full diagonal is marginally faster.

But with large arrays where creating a full B might a problem, using b might be faster. Also it's been observed in other SO that iterations on smaller arrays can be faster, due to better memory handling.

np.array([A[i,:]*[email protected] for i in range(3)])