Using Pandas to drop the extra rows after satisfying a condition-CodePudding

My sample df looks like this

id   year    success
1    2000      N
1    2001      N
1    2002      Y
1    2003      N
1    2004      N
2    2000      N
2    2001      N
2    2002      N
3    2000      N
3    2001      N
3    2002      Y
....

Here, we can see that the id==1 and id==3 has both success==Y and success==N but id==2 only has success==N

What I want to do?

I want to only have rows in which if we find the first success==Y, we drop the remaining column for that group, eg id==1

This is how the new df should look.

id   year    success
1    2000      N
1    2001      N
1    2002      Y
2    2000      N
2    2001      N
2    2002      N
3    2000      N
3    2001      N
3    2002      Y
....

Here, in the above df we removed the extra rows after we encountered success==Y. Since, id==2 does not have success==Y, we did not remove any rows and in id==2, the last row is the Y so no rows were removed.

What I did?

I tried to group the id but then I want all the results even though I have duplicate ids. So this did not work.

Could someone please help me achieve this result?

CodePudding user response：

First check success column against value Y, which gives you True value where success=Y; To mark all values after first Y as drop, we can further use cummax on this condition; Finally use the negated condition to filter the data frame:

df[~df.success.eq('Y').groupby(df.id).apply(lambda g: g.cummax().shift(fill_value=False))]

    id  year success
0    1  2000       N
1    1  2001       N
2    1  2002       Y
5    2  2000       N
6    2  2001       N
7    2  2002       N
8    3  2000       N
9    3  2001       N
10   3  2002       Y

CodePudding user response：

Try this:

(df.loc[(df.index <= df['id'].map(df['success'].eq('Y').groupby(df['id']).idxmax())) | 
df['success'].eq('N').groupby(df['id']).transform('all')])

(df.loc[df.groupby('id')['success']
.transform(lambda x: x.eq('Y')
.cumsum()
.shift()
.fillna(0)
.eq(0))])

Output:

    id  year success
0    1  2000       N
1    1  2001       N
2    1  2002       Y
5    2  2000       N
6    2  2001       N
7    2  2002       N
8    3  2000       N
9    3  2001       N
10   3  2002       Y