Python Pandas turn a dataframe into counts list or a vector-CodePudding

I have created the following pandas dataframe:

df=

userID movieID   timesWatched    
u1     mv1       5           
u1     mv2       2           
u2     mv1       1          
u3     mv4       30

I also have a list with 6 movies like this movies =['mv0', 'mv1', 'mv2', 'mv3', 'mv4', 'mv5']

What I would like to do is to create for every user a list like this:

u1 : [0, 5, 2, 0, 0, 0]
u2 : [0, 1, 0, 0, 0, 0]
u2 : [0, 0, 0, 0, 30, 0]

Is there a nice pythonic / pandas way of this this, avoiding confusing for loops?

CodePudding user response：

You can use categorical data and pivot_table and convert to_dict with the "list" format.

The dropna=False option of pivot_table combined with categorical data ensures to have all categories, even if all are NaNs.

movies =['mv0', 'mv1', 'mv2', 'mv3', 'mv4', 'mv5']

(df.assign(movieID=pd.Categorical(df['movieID'], categories=movies))
   .pivot_table(index='movieID',
                columns='userID',
                values='timesWatched',
                dropna=False, fill_value=0)
   .to_dict('list')
)

CodePudding user response：

Use DataFrame.pivot with DataFrame.reindex, repalced missing values and converting to dictionary by DataFrame.to_dict:

movies = ['mv0', 'mv1', 'mv2', 'mv3', 'mv4', 'mv5']

d = (df.pivot('movieID','userID','timesWatched')
       .reindex(movies)
       .fillna(0)
       .astype(int)
       .to_dict('list'))
print (d)
{'u1': [0, 5, 2, 0, 0, 0], 'u2': [0, 1, 0, 0, 0, 0], 'u3': [0, 0, 0, 0, 30, 0]}