I have created the following pandas dataframe:
df=
userID movieID timesWatched
u1 mv1 5
u1 mv2 2
u2 mv1 1
u3 mv4 30
I also have a list with 6 movies like this movies =['mv0', 'mv1', 'mv2', 'mv3', 'mv4', 'mv5']
What I would like to do is to create for every user a list like this:
u1 : [0, 5, 2, 0, 0, 0]
u2 : [0, 1, 0, 0, 0, 0]
u2 : [0, 0, 0, 0, 30, 0]
Is there a nice pythonic / pandas way of this this, avoiding confusing for loops?
CodePudding user response:
You can use categorical data and pivot_table
and convert to_dict
with the "list" format.
The dropna=False
option of pivot_table
combined with categorical data ensures to have all categories, even if all are NaNs.
movies =['mv0', 'mv1', 'mv2', 'mv3', 'mv4', 'mv5']
(df.assign(movieID=pd.Categorical(df['movieID'], categories=movies))
.pivot_table(index='movieID',
columns='userID',
values='timesWatched',
dropna=False, fill_value=0)
.to_dict('list')
)
CodePudding user response:
Use DataFrame.pivot
with DataFrame.reindex
, repalced missing values and converting to dictionary by DataFrame.to_dict
:
movies = ['mv0', 'mv1', 'mv2', 'mv3', 'mv4', 'mv5']
d = (df.pivot('movieID','userID','timesWatched')
.reindex(movies)
.fillna(0)
.astype(int)
.to_dict('list'))
print (d)
{'u1': [0, 5, 2, 0, 0, 0], 'u2': [0, 1, 0, 0, 0, 0], 'u3': [0, 0, 0, 0, 30, 0]}