I would like to filter a data frame with many users attempts on some tests. I have sorted the data frame based on the ID and date. The problem is I don't know how to filter all those rows up to the maximum for the specific user. I want to drop the rows that come after the maximum point for every user.
For example:
| user | score | date |
| A | 5 | 2021-11-14 10:22:13.854 |
| A | 7 | 2021-11-14 10:25:03.044 |
| B | 4 | 2021-11-16 19:01:42.005 |
| B | 7 | 2021-11-16 19:04:21.859 |
| B | 6 | 2021-11-16 19:06:52.372 |
I want to filter the data frame so that for user B only the first two rows are filters (since the third row is lower than the maximum for this user).
The result would be:
| user | score | date |
| A | 5 | 2021-11-14 10:22:13.854 |
| A | 7 | 2021-11-14 10:25:03.044 |
| B | 4 | 2021-11-16 19:01:42.005 |
| B | 7 | 2021-11-16 19:04:21.859 |
CodePudding user response:
This should work:
df.groupby('user').apply(lambda g: g.head(g['score'].argmax() 1)).reset_index(drop=True)
Because:
- first, group by the user/ID
- then for each group, get the location of the max-score (if there are multiple such scores it picks the first occurrence)
- and return rows up to that row