My dataframe has multiple rows by users, like this:
User | Value |
---|---|
A | 12 |
A | 5 |
B | 3 |
C | 7 |
D | 50 |
D | 1 |
I wanted to make a sample where i'll get all rows by user. Say that A and C are randomly selected, then i would have:
User | Value |
---|---|
A | 12 |
A | 5 |
C | 7 |
How to do this in python?
CodePudding user response:
You can randomly sample unique(s) Users, then use isin
and boolean indexing:
df[df['User'].isin(df['User'].drop_duplicates().sample(n=2))]
Or with numpy:
df[df['User'].isin(np.random.choice(df['User'].unique(), 2, replace=False))]
Example:
User Value
3 C 7
4 D 50
5 D 1
CodePudding user response:
With .sample(
):
df.sample(n=3)
index | User | Value |
---|---|---|
1 | A | 5 |
0 | A | 12 |
4 | D | 50 |
There are many different parameters you can set. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sample.html
With n=
you define how many result you want.
With replace=True or False
, you allow or disallow sampling of the same row more than once.