I have data like this:
Task | ID | Status |
---|---|---|
Task1 | 123 | Open |
Task2 | 123 | Closed |
Task3 | 211 | Closed |
Task4 | 211 | Closed |
Task5 | 564 | Closed |
Task6 | 994 | Open |
I want to delete rows with the same IDs that have 'Open' status. In other words I want to delete all IDs that have 'Open' status.
The end result will be like this:
Task | ID | Status |
---|---|---|
Task3 | 211 | Closed |
Task4 | 211 | Closed |
Task5 | 564 | Closed |
Data:
{'Task': ['Task1', 'Task2', 'Task3', 'Task4', 'Task5', 'Task6'],
'ID': [123, 123, 211, 211, 564, 994],
'Status': ['Open', 'Closed', 'Closed', 'Closed', 'Closed', 'Open']}
CodePudding user response:
We could take the Open Statuses and groupby
cummax
to create a boolean filter.
The idea is if a Status is Open, we flag its corresponding ID as True for all rows it appears, then we filter out all such rows:
out = df[~df['Status'].eq('Open').groupby(df['ID']).cummax()]
Output:
Task ID Status
2 Task3 211 Closed
3 Task4 211 Closed
4 Task5 564 Closed