Home > OS >  How can I select a random sequence of n rows for each group in a pandas data frame?
How can I select a random sequence of n rows for each group in a pandas data frame?

Time:09-26

Suppose I have the following data frame:

raw_data = {
    'subject_id': ['1', '1', '1', '1', '2','2','2','2','2'],
    'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Brian','Bob','Bill','Brenda','Brett']}
df = pd.DataFrame(raw_data, columns = ['subject_id', 'first_name'])

How can I select a sequence of n random rows from df for each subject_id? For example, if I want a sequence of 2 random rows for each subject_id, a possible output would be:

subject_id   first_name
1            Amy
1            Allen
2            Brenda
2            Brett

The post that seems most similar to this question seems to be:

select a random sequence of rows from pandas dataframe

However, this does not seem to take into account the grouping that I need to do.

CodePudding user response:

A little bit work after sample

s = df.groupby('subject_id')['subject_id'].sample(n=2)
idx = s.sort_index().drop_duplicates().index
s = df.loc[idx.union(idx 1)]
Out[53]: 
  subject_id first_name
2          1      Allen
3          1      Alice
4          2      Brian
5          2        Bob
  • Related