I'm trying to groupby a column and slice to find all rows after certain conditions are met. I've been trying to make it work with idxmax as shown in this example, and can get what I need for simple conditions, but not for multiple conditions.
df = pd.DataFrame({
'ID':[1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4],
'A':[1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4],
'B':[1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4]
})
ID A B
0 1 1 1
1 1 2 1
2 1 3 1
3 1 4 1
4 2 1 2
5 2 2 2
6 2 3 2
7 2 4 2
8 3 1 3
9 3 2 3
10 3 3 3
11 3 4 3
12 4 1 4
13 4 2 4
14 4 3 4
15 4 4 4
Using this example data frame, I'd like to group by 'ID', then slice to find all rows after A == 'B' & 'A' >= 2
. So this would be the desired output:
ID A B
0 2 2 2
1 2 3 2
2 2 4 2
3 3 3 3
4 3 4 3
5 4 4 4
I tried the following code expecting it to work, but get an unhashable type: 'slice'
error.
df.groupby('ID')[((df['A'] == df['B']) & (df['A'] >= 2)).idxmax():]
I've toyed around with trying to do that a few different ways either using .loc
or .values
but keep getting errors. Is there an easy way to do this in one line that I'm missing or would I need to set up a small function to accomplish this?
CodePudding user response:
We can use groupby
cummax
on the boolean condition in order to select all the rows after the condition is met
m = df['A'].eq(df['B']) & df['A'].ge(2)
df[m.groupby(df['ID']).cummax()]
Result
ID A B
5 2 2 2
6 2 3 2
7 2 4 2
10 3 3 3
11 3 4 3
15 4 4 4
CodePudding user response:
First, make your condition. A == B && A >= 2
. Then, group that by ID
, and use cummax
to extend it to all succeeding rows per-group:
filtered = df[(df['A'].eq(df['B']) & df['A'].ge(2)).groupby(df['ID']).cummax()]
Output:
>>> filtered
ID A B
5 2 2 2
6 2 3 2
7 2 4 2
10 3 3 3
11 3 4 3
15 4 4 4