Home > Mobile >  Group By : Remove groups(rows) based on condition
Group By : Remove groups(rows) based on condition

Time:08-15

I have a dataframe df :-

ID Date Event
3 04-09-2013 Success
3 27-09-2013 Success
3 14-10-2013 Success
3 15-10-2013 Email
3 18-10-2013 Form
3 19-10-2013 Click

If the first event for every group of ID is Success or followed by Consecutive Success(note strictly start with and /or followed by consecutive Success )then I want to drop the ID from the dataframe(Drop all the rows for that ID from the df)

Also if the case where the group is this way :-

ID Date Event
4 04-09-2013 Success
4 27-09-2013 Success
4 14-10-2013 Email
4 15-10-2013 Email
4 18-10-2013 Success
4 18-10-2013 Success
4 18-10-2013 Click

Expected output to be (i.e a.)remove rows if the first event/consecutive events are Success and also b.)drop the events(rows) after the last/latest success within each group(taken one example of Event click but if there are more events those are to be dropped as well):

ID Date Event
4 14-10-2013 Email
4 15-10-2013 Email
4 18-10-2013 Success
4 18-10-2013 Success

b.) is valid for every group irrespective of how the Event starts( for example ) :-

ID Date Event
5 14-10-2013 Email
5 15-10-2013 Click
5 18-10-2013 Success
5 18-10-2013 Email

Expected :-

ID Date Event
5 14-10-2013 Email
5 15-10-2013 Click
5 18-10-2013 Success

How can this be achieved?

CodePudding user response:

Use:

#remove values after last Success per groups   not remove only non Success groups
df = df.iloc[::-1]
m = df['Event'].eq('Success')
df  = df[~m.groupby(df['ID']).transform('all') | m.groupby(df['ID']).cummax()].iloc[::-1]

#remove first conscutive Success per groups
m1 = df['Event'].ne(df.groupby(df['ID'])['Event'].shift()).groupby(df['ID']).cumsum().ne(1)
#alternative solution
#m1 = df['Event'].mask(df['Event'].eq('Success')).groupby(df['ID']).ffill().notna()
m2 = df['Event'].ne('Success')

df = df[m1 | m2]
print (df)
    ID        Date    Event
8    4  14-10-2013    Email
9    4  15-10-2013    Email
10   4  18-10-2013  Success
11   4  18-10-2013  Success
13   5  14-10-2013    Email
14   5  15-10-2013    Click
15   5  18-10-2013  Success
  • Related