Given an example dataframe which holds data such as that below:
ID Field_1 Field_2 Group
1 ABC XYZ B
2 BCD ABF B
3 EEJ KYA B
..
12 KAS UUY Z
13 OEP PLO Z
..
84 HJH HIE N
85 YSU SAR N
How can one get a random, ordered sequence, such that a call to this method/lambda with a desired sequence of [B, Z, N, B]
would retrieve a random selection of rows to match this criteria?
I've seen previous answers which can get random rows from a sample based on the group - however the returned selection is not ordered. For example, a reference to this previous answer could be: Python: Random selection per group
CodePudding user response:
One simple method would be to generate a dictionary of groups and use a list comprehension with sample
and pandas.concat
:
order = ['B', 'Z', 'N', 'B']
d = dict(list(df.groupby('Group')))
df_sample = pd.concat([d[k].sample(1) for k in order])
output:
ID Field_1 Field_2 Group
1 2 BCD ABF B
4 13 OEP PLO Z
5 84 HJH HIE N
2 3 EEJ KYA B
NB. this will be sampling with replacement
CodePudding user response:
Filter rows in boolean indexing
with DataFrame.sample
should be way without groupby
:
order = ['B', 'Z', 'N', 'B']
df1 = pd.concat([df[df['Group'].eq(k)].sample(1) for k in order])
print (df1)
ID Field_1 Field_2 Group
2 3 EEJ KYA B
4 13 OEP PLO Z
6 85 YSU SAR N
1 2 BCD ABF B