Home > Software engineering >  Select random ordered sequence from dataframe
Select random ordered sequence from dataframe

Time:12-04

Given an example dataframe which holds data such as that below:

ID Field_1 Field_2 Group
1    ABC     XYZ     B
2    BCD     ABF     B
3    EEJ     KYA     B
..
12   KAS     UUY     Z
13   OEP     PLO     Z
..
84   HJH     HIE     N
85   YSU     SAR     N

How can one get a random, ordered sequence, such that a call to this method/lambda with a desired sequence of [B, Z, N, B] would retrieve a random selection of rows to match this criteria?

I've seen previous answers which can get random rows from a sample based on the group - however the returned selection is not ordered. For example, a reference to this previous answer could be: Python: Random selection per group

CodePudding user response:

One simple method would be to generate a dictionary of groups and use a list comprehension with sample and pandas.concat:

order = ['B', 'Z', 'N', 'B']
d = dict(list(df.groupby('Group')))
df_sample = pd.concat([d[k].sample(1) for k in order])

output:

   ID Field_1 Field_2 Group
1   2     BCD     ABF     B
4  13     OEP     PLO     Z
5  84     HJH     HIE     N
2   3     EEJ     KYA     B

NB. this will be sampling with replacement

CodePudding user response:

Filter rows in boolean indexing with DataFrame.sample should be way without groupby:

order = ['B', 'Z', 'N', 'B']
df1 = pd.concat([df[df['Group'].eq(k)].sample(1) for k in order])
print (df1)
   ID Field_1 Field_2 Group
2   3     EEJ     KYA     B
4  13     OEP     PLO     Z
6  85     YSU     SAR     N
1   2     BCD     ABF     B
  • Related