Problem:
I have a list of tuples, which each tuple represents a column of a 2D-array and each element of the tuple represents the index of that column of the array that is a 1; the other entries that aren't in that tuple, are 0.
I want to create an sparse matrix with this list of tuples in an efficient way (trying to not use for loops).
Example:
# init values
list_tuples = [
(0, 2, 4),
(0, 2, 3),
(1, 3, 4)
]
n = length(list_tuples) 1
m = 5 # arbritrary, however n >= max([ei for ei in list_tuples]) 1
# what I need is a function which accepts this tuples and give the shape of the array
# (at least the row size, because the column size can be infered from the list of tuples)
A = some_function(list_tuples, array_shape = (m, n))
Then what I expect to have is an array of the form:
[
[1, 1, 0]
[0, 0, 1]
[1, 1, 0]
[0, 1, 1]
[1, 0, 1]
]
CodePudding user response:
Your values are the indices
that are required for the compressed sparse column format. You'll also need the indptr
array, which for your data is the cumulative sum of the lengths of the tuples (prepended with 0). The data
array would be an array of ones with the same length as the sum of the lengths of the tuples, which you can get from the last element of the cumulative sum. Here's how that looks with your example:
In [45]: from scipy.sparse import csc_matrix
In [46]: list_tuples = [
...: (0, 2, 4),
...: (0, 2, 3),
...: (1, 3, 4)
...: ]
In [47]: indices = sum(list_tuples, ()) # Flatten the tuples into one sequence.
In [48]: indptr = np.cumsum([0] [len(t) for t in list_tuples])
In [49]: a = csc_matrix((np.ones(indptr[-1], dtype=int), indices, indptr))
In [50]: a
Out[50]:
<5x3 sparse matrix of type '<class 'numpy.int64'>'
with 9 stored elements in Compressed Sparse Column format>
In [51]: a.A
Out[51]:
array([[1, 1, 0],
[0, 0, 1],
[1, 1, 0],
[0, 1, 1],
[1, 0, 1]])
Note that csc_matrix
inferred the number of rows from the maximum that it found in the indices. You can use the shape
parameter to override that, e.g.
In [52]: b = csc_matrix((np.ones(indptr[-1], dtype=int), indices, indptr), shape=(7, len(list_tuples)))
In [53]: b
Out[53]:
<7x3 sparse matrix of type '<class 'numpy.int64'>'
with 9 stored elements in Compressed Sparse Column format>
In [54]: b.A
Out[54]:
array([[1, 1, 0],
[0, 0, 1],
[1, 1, 0],
[0, 1, 1],
[1, 0, 1],
[0, 0, 0],
[0, 0, 0]])
You can also generate a coo_matrix
pretty easily. The flattened list_tuples
gives the row indices, and np.repeat
can be used to create the column indices:
In [63]: from scipy.sparse import coo_matrix
In [64]: i = sum(list_tuples, ()) # row indices
In [65]: j = np.repeat(range(len(list_tuples)), [len(t) for t in list_tuples])
In [66]: c = coo_matrix((np.ones(len(i), dtype=int), (i, j)))
In [67]: c
Out[67]:
<5x3 sparse matrix of type '<class 'numpy.int64'>'
with 9 stored elements in COOrdinate format>
In [68]: c.A
Out[68]:
array([[1, 1, 0],
[0, 0, 1],
[1, 1, 0],
[0, 1, 1],
[1, 0, 1]])