Learning Experiments
In a series of learning experiments, I would like to count the number of participants in each experiment that improved their performance in subsequent experiments (Rank 1 is highest). In addition, I would also like to count the number of participants in each experiment that subsequently reached the top rank.
Here is a short, sanitized version of the learning experiment csv file that I have loaded into a pandas dataframe (df_learning).
Experiment | Subject | Rank |
---|---|---|
A | Alpha | 1 |
A | Bravo | 2 |
A | Charlie | 3 |
A | Delta | 4 |
A | Echo | 5 |
B | Alpha | 1 |
B | Charlie | 2 |
B | Echo | 3 |
B | Foxtrot | 4 |
B | Golf | 5 |
B | India | 6 |
B | Juliet | 7 |
C | Juliet | 1 |
C | Bravo | 2 |
C | Charlie | 3 |
Please advise?
CodePudding user response:
You can use a groupby.cummax
, then boolean indexing:
m = df['Rank'].sub(df.groupby('Subject')['Rank'].cummax()).lt(0)
improved_rank = df.loc[m, 'Subject'].unique()
output: ['Charlie', 'Echo', 'Juliet']
reached_top_rank = df.loc[m&df['Rank'].eq(1), 'Subject'].unique()
output: ['Juliet']