How to group by user id to a specific row?-CodePudding

I am trying to select the unique user id that is going to a specific user.

let's say I want it to be 200,000 rows from 10M rows. I want only only 1500 unique user id with around 200,000 rows(the rows does not need to be specific a few thousands is okay). Each user has multiple ratings.

Here's the dataset link.

How I load the data.

names = ['user_id', 'movie_id', 'rating', 'timestamp']
df = pd.read_csv('ratings.csv', sep='::', names=names)
print(df)

Is there any way to group it like that with pandas?

CodePudding user response：

I didn't test the real dataset, but the logic should be something like:

# select 1500 unique users
import numpy as np
users = np.random.choice(df['user_id'].unique(), size=1500, replace=False)

# filter the users and get (up to) 200k random rows
df_sample = df[df['user_id'].isin(users)].sample(n=200000)

documentations: numpy.random.choice and pandas.DataFrame.sample