Pandas data frame change structure-CodePudding

I have a dataframe for movie ratings like this:

I want to have a new data frame to have data as a sequence for each user and each item in the sequence contains a vector of movieid and its rating, that looks something like this:

userId    moviesandratings
1        [[296,5],[306,3.5],[307,5],etc]

for each user.

CodePudding user response：

You can create new column filled by lsits and then aggregate lists in GroupBy.agg:

df['new'] = df[['movieId','rating']].to_numpy().tolist()

df1 = df.groupby('userId')['new'].agg(list).reset_index(name='moviesandratings')

Or use GroupBy.apply:

df1 = (df.groupby('userId')[['movieId','rating']]
         .apply(lambda x: x.to_numpy().tolist())
         .reset_index(name='moviesandratings'))