I have a dataframe
df = pd.DataFrame([["A","13-02-2022","B","FALSE"],["A","13-02-2022","C","FALSE"],["A","14-02-2022","D","FALSE"],
["A","14-02-2022","E","FALSE"],["A","16-02-2022","A","TRUE"],["A","16-02-2022","F","FALSE"],
["A","17-02-2022","G","FALSE"],["A","17-02-2022","H","FALSE"],["A","18-02-2022","I","FALSE"],
["A","18-02-2022","J","FALSE"]],columns=["id1","date","id2","flag"])
id1 date id2 flag
A 13-02-2022 B FALSE
A 13-02-2022 C FALSE
A 14-02-2022 D FALSE
A 14-02-2022 E FALSE
A 16-02-2022 A TRUE
A 16-02-2022 F FALSE
A 17-02-2022 G FALSE
A 17-02-2022 H FALSE
A 18-02-2022 I FALSE
A 18-02-2022 J FALSE
I want to remove all the rows of previous working day, next working day and the day where flag is TRUE.
For example here 16th Feb flag is TRUE, so remove all the rows of previous working day 14th Feb, next working day 17th Feb and 16th Feb. If TRUE is in last day of month 28th Feb where next working day is not there, then remove the rows of TRUE flag day and previous working day only.
Expected Output:
df_out = pd.DataFrame([["A","13-02-2022","B","FALSE"],["A","13-02-2022","C","FALSE"],["A","18-02-2022","I","FALSE"],
["A","18-02-2022","J","FALSE"]],columns=["id1","date","id2","flag"])
id1 date id2 flag
A 13-02-2022 B FALSE
A 13-02-2022 C FALSE
A 18-02-2022 I FALSE
A 18-02-2022 J FALSE
How to do it?
CodePudding user response:
You can use boolean indexing:
# ensure boolean and datetime
df['flag'] = df['flag'].eq('TRUE')
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
bday = pd.offsets.BusinessDay(1)
drop = pd.concat([dates bday, dates-bday])
out = df[~(df['date'].isin(drop) | df['flag'])]
Output:
id1 date id2 flag
0 A 2022-02-13 B False
1 A 2022-02-13 C False
2 A 2022-02-14 D False
3 A 2022-02-14 E False
5 A 2022-02-16 F False
8 A 2022-02-18 I False
9 A 2022-02-18 J False
CodePudding user response:
You can try to create a filter data frame and select everything which is not in it:
df['date'] = pd.to_datetime(df['date'], format="%d-%m-%Y")
dates = df[df.flag == 'TRUE']['date']
to_drop = pd.concat([dates, dates pd.offsets.BusinessDay(1), dates - pd.offsets.BusinessDay(1)])
df_out = df[~df['date'].isin(to_drop)]
df_out