Home > OS >  Filtering Dataframe with the multiple conditions with groupby
Filtering Dataframe with the multiple conditions with groupby

Time:09-22

Existing Dataframe :

ID       Activity     Action No.    Status  
A1       register         1           in
A1       fill form        2           in
A2       Payment Done     3         success
A2       fill form        2           in
B1       fill form        1           in
B1       Payment Done     2         success
B1       Process_drop     3           in
B1       fill form        1           in

Expected Dataframe :

ID       Activity     Action No.    Status  
A1       register         1           in
A1       fill form        2           in
A2       Payment Done     3         success
B1       Payment Done     2         success

Need to drop the remaining rows for each ID , once Activity == Payment Done and Status == success for that particular ID.( i.e. further rows needs to be removed once the condition is achieved)

i approached by grouping the ID and applying the conditions on the columns , but not sure about dropping the rows

CodePudding user response:

cond = df.Activity.eq("Payment Done") & df.Status.eq("success")
mask = cond.groupby(df.ID).cummax().shift(fill_value=False)
new  = df[~mask]
  • get the terminating condition across frame
  • group that True/False condition by IDs
  • take the cumulative maximum; once True always True
  • shift it because you want to drop after
  • negate it so that we keep False's, i.e., including & before terminating condition
In [115]: new
Out[115]:
   ID      Activity  Action No.   Status
0  A1      register           1       in
1  A1     fill form           2       in
2  A2  Payment Done           3  success
5  B1  Payment Done           2  success

CodePudding user response:

You can use boolean indexing.

You can determine if there is any match per group and drop the rows:

# is the row matching your criteria?
m = df['Activity'].eq('Payment Done') & df['Status'].eq('success')
# is there NOT any match in the group?
m2 = ~m.groupby(df['ID']).transform('any')

# keep rows matching either of the above
df[m|m2]

output:

   ID      Activity  Action No.   Status
0  A1      register           1       in
1  A1     fill form           2       in
2  A2  Payment Done           3  success
5  B1  Payment Done           2  success

intermediates:

   ID      Activity  Action No.   Status      m     m2   m|m2
0  A1      register           1       in  False   True   True
1  A1     fill form           2       in  False   True   True
2  A2  Payment Done           3  success   True  False   True
3  A2     fill form           2       in  False  False  False
4  B1     fill form           1       in  False  False  False
5  B1  Payment Done           2  success   True  False   True
6  B1  Process_drop           3       in  False  False  False
7  B1     fill form           1       in  False  False  False
  • Related