Home > OS >  Convert dictionary sparse matrix to numpy sparse matrix
Convert dictionary sparse matrix to numpy sparse matrix

Time:10-06

I am relatively new to numpy and sparse matrix. I am trying to convert my data to a sparse matrix, given the following instructions

if you take the current format and read the data in as a dictionary, then you can easily convert the feature-value mappings to vectors ( you can choose these vectors to be sparse matrices).

Given a pandas DataFrame as follows

    sentiment  tweet_id                                              tweet
    0       neg         1  [(3083, 0.4135918197208131), (3245, 0.79102943...
    1       neg         2  [(679, 0.4192120119709425), (1513, 0.523940563...
    2       neg         3  [(225, 0.5013098541806313), (1480, 0.441928325...

I converted it to a dictionary -

sparse_mat = {
    (0, 3083): 0.4135918197208131, 
    (0, 3245): 0.7910294373931178, 
    (0, 4054): 0.4507928968357355, 
    (1, 679): 0.4192120119709425, 
    (1, 1513): 0.5239405639724402, 
    (1, 2663): 0.2689391233917331, 
    (1, 3419): 0.5679685442982928, 
    (1, 4442): 0.39348577488961367, 
    (2, 225): 0.5013098541806313, 
    (2, 1480): 0.44192832578442043, 
    (2, 2995): 0.3209783438156829, 
    (2, 3162): 0.4897198689787062, 
    (2, 3551): 0.2757628355961508, 
    (2, 3763): 0.3667287774412633
}

From my understanding this is a valid sparse matrix. I want to store it as a numpy object, say a csr_matrix. I tried to run the following code -

csr_matrix(sparse_mat)

Which gives this error -

TypeError: no supported conversion for types: (dtype('O'),)

How can I go about this? Am I missing something?

CodePudding user response:

from doc : https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html

from scipy.sparse import csr_matrix   

d = {
    (0, 3083): 0.4135918197208131, 
    (0, 3245): 0.7910294373931178, 
    (0, 4054): 0.4507928968357355, 
    (1, 679): 0.4192120119709425, 
    (1, 1513): 0.5239405639724402, 
    (1, 2663): 0.2689391233917331, 
    (1, 3419): 0.5679685442982928, 
    (1, 4442): 0.39348577488961367, 
    (2, 225): 0.5013098541806313, 
    (2, 1480): 0.44192832578442043, 
    (2, 2995): 0.3209783438156829, 
    (2, 3162): 0.4897198689787062, 
    (2, 3551): 0.2757628355961508, 
    (2, 3763): 0.3667287774412633
}

keys = d.keys()
row = [k[0] for k in keys]
col = [k[1] for k in keys]
data = list(d.values())

sparse_arr = csr_matrix((data, (row, col)))
arr = sparse_arr.toarray()
  • Related