Using groupby with idxmax to find values after certain condition-CodePudding

I'm trying to groupby a column and slice to find all rows after certain conditions are met. I've been trying to make it work with idxmax as shown in this example, and can get what I need for simple conditions, but not for multiple conditions.

df = pd.DataFrame({
    'ID':[1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4],
    'A':[1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4],
    'B':[1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4]
})

    ID  A  B
0    1  1  1
1    1  2  1
2    1  3  1
3    1  4  1
4    2  1  2
5    2  2  2
6    2  3  2
7    2  4  2
8    3  1  3
9    3  2  3
10   3  3  3
11   3  4  3
12   4  1  4
13   4  2  4
14   4  3  4
15   4  4  4

Using this example data frame, I'd like to group by 'ID', then slice to find all rows after A == 'B' & 'A' >= 2. So this would be the desired output:

   ID  A  B
0   2  2  2
1   2  3  2
2   2  4  2
3   3  3  3
4   3  4  3
5   4  4  4

I tried the following code expecting it to work, but get an unhashable type: 'slice' error.

df.groupby('ID')[((df['A'] == df['B']) & (df['A'] >= 2)).idxmax():]

I've toyed around with trying to do that a few different ways either using .loc or .values but keep getting errors. Is there an easy way to do this in one line that I'm missing or would I need to set up a small function to accomplish this?

CodePudding user response：

We can use groupby cummax on the boolean condition in order to select all the rows after the condition is met

m = df['A'].eq(df['B']) & df['A'].ge(2)
df[m.groupby(df['ID']).cummax()]

Result

    ID  A  B
5    2  2  2
6    2  3  2
7    2  4  2
10   3  3  3
11   3  4  3
15   4  4  4

CodePudding user response：

First, make your condition. A == B && A >= 2. Then, group that by ID, and use cummax to extend it to all succeeding rows per-group:

filtered = df[(df['A'].eq(df['B']) & df['A'].ge(2)).groupby(df['ID']).cummax()]

Output:

>>> filtered
    ID  A  B
5    2  2  2
6    2  3  2
7    2  4  2
10   3  3  3
11   3  4  3
15   4  4  4