I'm trying to execute a matrix multiplication which has the following scheme:
C = np.dot(np.dot(sparse.csr_matrix(np.double(A).transpose()),sparse.spdiags(B,0,Ngrid,Ngrid)), sparse.csr_matrix(np.double(A)))
Thus, I want to transpose matrix A, which lead to a N x M matrix with M>>N and multiply with the diagonal matrix which is a M x M matrix. B is the „main diagonal“. The resulting matrix (N x M) should be multiplied with matrix A (M x N) and lead to the N x N matrix C.
The error appears is the following:
<2000x921600 sparse matrix of type '<class 'numpy.float64'>'
with 1843066024 stored elements in Compressed Sparse Row format>
As the final matrix is N x N, I want to have this matrix as a numpy array. How you see, I try to make the matrix in the mid as a sparsity diagonal matrix which works well. But, I cant understand why Python need this insane large matrix with 1843066024 elements to conduct the multiplication.
Do you have some ideas and/or explanation why this problem appears?
CodePudding user response:
You're doing this... overly complicated. Here's a straightforward path for M >> N
(you're inconsistent on that).
import numpy as np
B = sparse.spdiags(B,0,Ngrid,Ngrid)) # [M x M] sparse
A = np.ndarray(..., dtype=float) # [M x N] dense
C = A.T @ B # [N x M] dense
C = C @ A # [N x N] dense
C
is then the array that you want. None of your intermediate products are M x M
. If you still have memory problems, you either need to get more memory, or you need to chunk your problem into smaller pieces on your m
axis and calculate them piecewise.
CodePudding user response:
If B
is diagonal, you don't need to use sparse to save memory. You can do the calculation with a 1d array, just the diagonal values.
I'll demonstrate with small dimensions, where making a full B
is doesn't cause problems. Others can test this with large dimensions.
In [5]: A = np.arange(24).reshape(3,8)
In [7]: B = np.diag(np.arange(8))
In [8]: B.shape
Out[8]: (8, 8)
The double matmul:
In [10]: A@[email protected]
Out[10]:
array([[ 784, 1904, 3024],
[ 1904, 4816, 7728],
[ 3024, 7728, 12432]])
The equivalent einsum
:
In [12]: np.einsum('ij,jk,lk',A,B,A)
Out[12]:
array([[ 784, 1904, 3024],
[ 1904, 4816, 7728],
[ 3024, 7728, 12432]])
The 1d diagonal of B
:
In [15]: b = np.diag(B)
A broadcasted multiplication does the same thing as the matmul:
In [17]: np.allclose(A*b,A@B)
Out[17]: True
In [18]: np.allclose(A*[email protected],A@[email protected])
Out[18]: True
Expressing that with einsum
:
In [19]: np.einsum('ij,j,lj',A,b,A)
Out[19]:
array([[ 784, 1904, 3024],
[ 1904, 4816, 7728],
[ 3024, 7728, 12432]])
Some comparative times:
In [20]: timeit np.einsum('ij,j,lj',A,b,A)
10.6 µs ± 22.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [21]: timeit np.einsum('ij,jk,lk',A,B,A)
15.2 µs ± 13.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [22]: timeit A@[email protected]
7.26 µs ± 20.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [23]: timeit A*[email protected]
8.17 µs ± 12.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
For einsum
the 'condensed' version is faster. With matmul
the full diagonal is marginally faster.
But with large arrays where creating a full B
might a problem, using b
might be faster. Also it's been observed in other SO that iterations on smaller arrays can be faster, due to better memory handling.
np.array([A[i,:]*[email protected] for i in range(3)])