Home > Software engineering >  Random sampling users
Random sampling users

Time:06-08

My dataframe has multiple rows by users, like this:

User Value
A 12
A 5
B 3
C 7
D 50
D 1

I wanted to make a sample where i'll get all rows by user. Say that A and C are randomly selected, then i would have:

User Value
A 12
A 5
C 7

How to do this in python?

CodePudding user response:

You can randomly sample unique(s) Users, then use isin and boolean indexing:

df[df['User'].isin(df['User'].drop_duplicates().sample(n=2))]

Or with numpy:

df[df['User'].isin(np.random.choice(df['User'].unique(), 2, replace=False))]

Example:

  User  Value
3    C      7
4    D     50
5    D      1

CodePudding user response:

With .sample():

df.sample(n=3)
index User Value
1 A 5
0 A 12
4 D 50

There are many different parameters you can set. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sample.html

With n= you define how many result you want.

With replace=True or False, you allow or disallow sampling of the same row more than once.

  • Related