Home > Software design >  Filtering all the rows until a max is reached in each group
Filtering all the rows until a max is reached in each group

Time:12-13

I would like to filter a data frame with many users attempts on some tests. I have sorted the data frame based on the ID and date. The problem is I don't know how to filter all those rows up to the maximum for the specific user. I want to drop the rows that come after the maximum point for every user.

For example:

| user | score | date                    |
| A    | 5     | 2021-11-14 10:22:13.854 |
| A    | 7     | 2021-11-14 10:25:03.044 |
| B    | 4     | 2021-11-16 19:01:42.005 |
| B    | 7     | 2021-11-16 19:04:21.859 |
| B    | 6     | 2021-11-16 19:06:52.372 |

I want to filter the data frame so that for user B only the first two rows are filters (since the third row is lower than the maximum for this user).

The result would be:

| user | score | date                    |
| A    | 5     | 2021-11-14 10:22:13.854 |
| A    | 7     | 2021-11-14 10:25:03.044 |
| B    | 4     | 2021-11-16 19:01:42.005 |
| B    | 7     | 2021-11-16 19:04:21.859 |

CodePudding user response:

This should work:

df.groupby('user').apply(lambda g: g.head(g['score'].argmax() 1)).reset_index(drop=True)

Because:

  • first, group by the user/ID
  • then for each group, get the location of the max-score (if there are multiple such scores it picks the first occurrence)
  • and return rows up to that row
  • Related