I have a dataframe for movie ratings like this:
I want to have a new data frame to have data as a sequence for each user and each item in the sequence contains a vector of movieid and its rating, that looks something like this:
userId moviesandratings
1 [[296,5],[306,3.5],[307,5],etc]
for each user.
CodePudding user response:
You can create new column filled by lsits and then aggregate lists in GroupBy.agg
:
df['new'] = df[['movieId','rating']].to_numpy().tolist()
df1 = df.groupby('userId')['new'].agg(list).reset_index(name='moviesandratings')
Or use GroupBy.apply
:
df1 = (df.groupby('userId')[['movieId','rating']]
.apply(lambda x: x.to_numpy().tolist())
.reset_index(name='moviesandratings'))