I have this pandas dataframe:
I want that
IF there is a day that a row of condition_2 is 'True' BEFORE a row of condition_1, then, change the row of condition_2 to NaN.
Dataframe structure: There is not the possibility to have True and True in the same row of the columns
So in the previous dataframe, this is what should look like:
This is sample code:
import pandas as pd
from datetime import datetime
tbl = {"date" :["2022-02-27", "2022-02-27", "2022-02-27", "2022-02-28", "2022-02-28","2022-02-28"],
"condition_1" : ["True", "NaN", "NaN", "NaN", "True", "NaN"],
"condition_2" : ["NaN", "NaN", "True", "True", "NaN", "NaN"]}
df = pd.DataFrame(tbl)
df = df.replace('NaN', float('nan'))
pd.to_datetime(df['date'], format='%Y-%m-%d')
df.sort_values(by = "date", inplace=True)
Any ideas? Maybe I can use a for loop and if conditions?
CodePudding user response:
You can groupby date
column and compare the condition
by shift
m = (df.groupby('date', as_index=False, group_keys=False)
.apply(lambda g: g['condition_2'].eq('True') & g['condition_1'].shift(-1).eq('True')))
df['condition_2'] = df['condition_2'].mask(m, 'NaN')
print(m)
0 False
1 False
2 False
3 True
4 False
5 False
dtype: bool
print(df)
date condition_1 condition_2
0 2022-02-27 True NaN
1 2022-02-27 NaN NaN
2 2022-02-27 NaN True
3 2022-02-28 NaN NaN
4 2022-02-28 True NaN
5 2022-02-28 NaN NaN