Home > Software design >  Pandas: filtering rows based on condition
Pandas: filtering rows based on condition

Time:09-22

Existing Dataframe :

Id           Date            Status
A         26-01-2022         begin
A         26-01-2022         failed
A         27-01-2022         begin
A         27-01-2022         in-process
A         27-01-2022         success
B         01-02-2022         in-process
B         01-02-2022         success
B         02-02-2022         begin
B         02-02-2022         failed

Expected Dataframe :

Id           Date            Status
A         27-01-2022         begin
A         27-01-2022         in-process
A         27-01-2022         success
B         01-02-2022         in-process
B         01-02-2022         success

Need to keep all the activity of the Id where the status for that Id ends with success for that particular date.

tried grouping Id and Date, but stuck with dropping the relevant rows

CodePudding user response:

Use transform('last'):

df[df.groupby(['Id', 'Date']).Status.transform('last') == 'success']

CodePudding user response:

You can use groupby and filter by last value of Status column

out = (df.groupby(['Id', 'Date'])
       .filter(lambda g: g['Status'].iloc[-1] == 'success'))
print(out)

  Id        Date      Status
2  A  27-01-2022       begin
3  A  27-01-2022  in-process
4  A  27-01-2022     success
5  B  01-02-2022  in-process
6  B  01-02-2022     success
  • Related