Home > database >  How to do operations inside pandas dataframe based on conditions
How to do operations inside pandas dataframe based on conditions

Time:07-02

I have this pandas dataframe:

dataframe i have

I want that

IF there is a day that a row of condition_2 is 'True' BEFORE a row of condition_1, then, change the row of condition_2 to NaN.

Dataframe structure: There is not the possibility to have True and True in the same row of the columns

So in the previous dataframe, this is what should look like:

dataframe i want to have

This is sample code:

import pandas as pd
from datetime import datetime
tbl = {"date" :["2022-02-27", "2022-02-27", "2022-02-27", "2022-02-28", "2022-02-28","2022-02-28"],
        "condition_1" : ["True", "NaN", "NaN", "NaN", "True", "NaN"],
        "condition_2" : ["NaN", "NaN", "True", "True", "NaN", "NaN"]}



df = pd.DataFrame(tbl)
df = df.replace('NaN', float('nan'))
pd.to_datetime(df['date'], format='%Y-%m-%d')
df.sort_values(by = "date", inplace=True)

Any ideas? Maybe I can use a for loop and if conditions?

CodePudding user response:

You can groupby date column and compare the condition by shift

m = (df.groupby('date', as_index=False, group_keys=False)
     .apply(lambda g: g['condition_2'].eq('True') & g['condition_1'].shift(-1).eq('True')))

df['condition_2'] = df['condition_2'].mask(m, 'NaN')
print(m)

0    False
1    False
2    False
3     True
4    False
5    False
dtype: bool

print(df)

         date condition_1 condition_2
0  2022-02-27        True         NaN
1  2022-02-27         NaN         NaN
2  2022-02-27         NaN        True
3  2022-02-28         NaN         NaN
4  2022-02-28        True         NaN
5  2022-02-28         NaN         NaN
  • Related