I have two sparse matrice where the first has the property of
<1x40 sparse matrix of type '<class 'numpy.intc'>'
with 10 stored elements in Compressed Sparse Row format>
And the second one:
<9x15426 sparse matrix of type '<class 'numpy.int64'>'
with 25 stored elements in Compressed Sparse Row format>
I want to append the 40 dimensions of the first matrix to each of the 9 <1x15426> dimensions of the second matrix so the resulting matrix will have
<9x15466 sparse matrix of type '<class 'numpy.int64'>'
with 25 stored elements in Compressed Sparse Row format>
Is this possible without converting to dense arrays? Thanks!
CodePudding user response:
Ok my previous answer was premature, even though correct. Here is a better attempt:
from scipy.sparse import csr_matrix, hstack
import numpy as np
csr1 = csr_matrix(
np.random.randint(0, 3, (9, 15426)),
)
csr2 = csr_matrix(
np.random.randint(0, 3, (1, 40)),
)
hstack((csr1, csr_matrix(np.ones([9,1]))*csr2[0]))
CodePudding user response:
Yes it is possible. You just need to go into the contents of the matrix and carefully loop over it. Here is an example (probably could skip turning into lists but I thought concatenations are more simple with lists):
from scipy.sparse import csr_matrix
import numpy as np
csr1 = csr_matrix(
np.random.randint(0, 3, (9, 15426)),
)
csr2 = csr_matrix(
np.random.randint(0, 3, (1, 40)),
)
def glue_csr2_to_csr1(csr1:csr_matrix, csr2:csr_matrix) -> csr_matrix:
curr_pointer = 0
csr1_data = list(csr1.data)
csr1_indices = list(csr1.indices)
csr1_indptr = list(csr1.indptr)
csr2_data = list(csr2.data)
csr2_indices = list(csr2.indices)
csr2_indptr = list(csr2.indptr)
new_pointers = [0]
new_data = []
new_indices = []
for row_index in range(len(csr1.indptr)-1):
row_data = csr1_data[csr1_indptr[row_index]:csr1_indptr[row_index 1]]
row_indices = csr1_indices[csr1_indptr[row_index]:csr1_indptr[row_index 1]]
new_row_data = row_data csr2_data
new_data = new_row_data
new_row_indices = row_indices [x csr1.shape[1] for x in csr2_indices]
new_indices = new_row_indices
curr_pointer = len(row_data) csr2_indptr[1]
new_pointers.append(curr_pointer)
res = csr_matrix((new_data, new_indices, new_pointers))
res.resize(csr1.shape[0], csr1.shape[1] csr2.shape[1])
return res
glue_csr2_to_csr1(csr1, csr2)
Output:
<9x15466 sparse matrix of type '<class 'numpy.int64'>'
with 92799 stored elements in Compressed Sparse Row format>
If I misunderstood your question and you don't need to add elements from the second matrix (and just fill with 0s) then you do csr1.resize(9,15466)
and that should be it