Home > Net >  Remove the rows of dataframe based on date and flag condition in pandas
Remove the rows of dataframe based on date and flag condition in pandas

Time:01-18

I have a dataframe

df = pd.DataFrame([["A","13-02-2022","B","FALSE"],["A","13-02-2022","C","FALSE"],["A","14-02-2022","D","FALSE"],
                   ["A","14-02-2022","E","FALSE"],["A","16-02-2022","A","TRUE"],["A","16-02-2022","F","FALSE"],
                   ["A","17-02-2022","G","FALSE"],["A","17-02-2022","H","FALSE"],["A","18-02-2022","I","FALSE"],
                   ["A","18-02-2022","J","FALSE"]],columns=["id1","date","id2","flag"])
id1   date     id2  flag
A   13-02-2022  B   FALSE
A   13-02-2022  C   FALSE
A   14-02-2022  D   FALSE
A   14-02-2022  E   FALSE
A   16-02-2022  A   TRUE
A   16-02-2022  F   FALSE
A   17-02-2022  G   FALSE
A   17-02-2022  H   FALSE
A   18-02-2022  I   FALSE
A   18-02-2022  J   FALSE

I want to remove all the rows of previous working day, next working day and the day where flag is TRUE.

For example here 16th Feb flag is TRUE, so remove all the rows of previous working day 14th Feb, next working day 17th Feb and 16th Feb. If TRUE is in last day of month 28th Feb where next working day is not there, then remove the rows of TRUE flag day and previous working day only.

Expected Output:

df_out = pd.DataFrame([["A","13-02-2022","B","FALSE"],["A","13-02-2022","C","FALSE"],["A","18-02-2022","I","FALSE"],
                       ["A","18-02-2022","J","FALSE"]],columns=["id1","date","id2","flag"])
id1   date     id2  flag
A   13-02-2022  B   FALSE
A   13-02-2022  C   FALSE
A   18-02-2022  I   FALSE
A   18-02-2022  J   FALSE

How to do it?

CodePudding user response:

You can use boolean indexing:

# ensure boolean and datetime
df['flag'] = df['flag'].eq('TRUE')
df['date'] = pd.to_datetime(df['date'], dayfirst=True)

bday = pd.offsets.BusinessDay(1)

drop = pd.concat([dates bday, dates-bday])

out = df[~(df['date'].isin(drop) | df['flag'])]

Output:

  id1       date id2   flag
0   A 2022-02-13   B  False
1   A 2022-02-13   C  False
2   A 2022-02-14   D  False
3   A 2022-02-14   E  False
5   A 2022-02-16   F  False
8   A 2022-02-18   I  False
9   A 2022-02-18   J  False

CodePudding user response:

You can try to create a filter data frame and select everything which is not in it:

df['date'] = pd.to_datetime(df['date'], format="%d-%m-%Y")

dates = df[df.flag == 'TRUE']['date']
to_drop = pd.concat([dates, dates   pd.offsets.BusinessDay(1), dates - pd.offsets.BusinessDay(1)])
df_out = df[~df['date'].isin(to_drop)]
df_out
  • Related