I have a csv file with the following structure
Name | Position |
---|---|
Albert | Producer |
John | Director |
Anna | Volunteer |
Gaston | Tech |
other 51 names | other 51 positions |
I need to create random groups where each group has one member from each position. There are only four positions across the dataset. The final result would be something like
Name | Position | Group |
---|---|---|
Albert | Producer | Group 1 |
John | Director | Group 1 |
Anna | Volunteer | Group 1 |
Gaston | Tech | Group 1 |
Martin | Producer | Group 2 |
Vanessa | Director | Group 2 |
Charles | Volunteer | Group 2 |
Milan | Tech | Group 2 |
What is the most efficient way to create these groups?
Thank you in advance!
CodePudding user response:
You can use two groupby
. One to shuffle the Positions using sample(frac=1)
, and the other to assign the groups:
(df.groupby('Position').pipe(lambda d: d.sample(frac=1))
.assign(Group=lambda d: d.groupby('Position').cumcount().add(1))
.sort_values(by=['Group', 'Position']) # optional
)
example output:
Name Position Group
1 John Director 1
4 Martin Producer 1
3 Gaston Tech 1
6 Charles Volunteer 1
5 Vanessa Director 2
0 Albert Producer 2
7 Milan Tech 2
2 Anna Volunteer 2