Home > Software design >  Transforming dataframe to sparse matrix and reset index
Transforming dataframe to sparse matrix and reset index

Time:07-02

I have a data set with the rating of user ID to all product ID. There are only 5000 products and 10,000 users but the ID is in different number. I would like to transform my dataframe to a coo_sparse_matrix(data, (row,col), shape) but with row and col as the real number of products and users, not the ID. Is there any way to do that? Below is the illustration:

Data frame:

User ID Product ID Rating
1 14 0.1
1 15 0.2
2 14 0.3
2 16 0.3
5 19 0.4

and expected to have a matrix (in sparse coo form)

ProductID 14 15 16 19
UserID
1 0.1 0.2 0 0
2 0.3 0 0.3 0
5 0 0 0 0.4

because normally the sparse_coo would give a very large matrix with index (1,2,...,19) for product ID and (1,2,3,4,5) for user ID.

Please help me, it is for the thesis due in 3 days and I just found out this error, I code with Python.

Thank you very much!

CodePudding user response:

Hi hope this helps and good luck with your thesis:

import pandas as pd
from scipy.sparse import coo_matrix

dataframe=pd.DataFrame(data={'User ID':[1,1,2,2,5], 'Product ID':[14,15,14,16,19], 'Rating':[0.1,0.2,0.3,0.3,0.4]})

row=dataframe['User ID']
col=dataframe['Product ID']
data=dataframe['Rating']

coo=coo_matrix((data, (row, col))).toarray()
new_dataframe=pd.DataFrame(coo)

#Drop non existing Product IDs --optional delet if not intended
new_dataframe=new_dataframe.loc[:, (new_dataframe != new_dataframe.iloc[0]).any()] 

#Drop non existing User IDs --optional delet if not intended
new_dataframe=new_dataframe.loc[(new_dataframe!=0).any(axis=1)]

print(new_dataframe)

Output:

    14   15   16   19
1  0.1  0.2  0.0  0.0
2  0.3  0.0  0.3  0.0
5  0.0  0.0  0.0  0.4
  • Related