I have a dataframe as
index col1 col2 col3
0 2022-09-25 21:00:00 2022-09-25 20:00:00 NaN
1 2022-09-25 21:26:00 NaN NaN
2 2022-09-25 21:00:00 2022-09-25 22:00:00 2022-09-25 23:00:00
3 2022-09-25 21:00:00 2022-09-25 22:00:00 2022-09-25 20:00:00
4 2022-09-25 21:00:00 2022-09-25 23:00:00 NaN
I want to filter the rows for which col1 < col2 and col2 < col3. In this case, the condition date < NaN should always return true.
My desired output is
index col1 col2 col3
1 2022-09-25 21:26:00 NaN NaN
2 2022-09-25 21:00:00 2022-09-25 22:00:00 2022-09-25 23:00:00
4 2022-09-25 21:00:00 2022-09-25 23:00:00 NaN
index 0 is removed because col1 > col2 and index 3 is removed because col2 > col3.
CodePudding user response:
By default, comparison with NaT always gives False
.
To get True
, you can use the reverse condition and invert it:
# ensure datetime
df[['col1', 'col2', 'col3']] = df[['col1', 'col2', 'col3']].apply(pd.to_datetime)
# not col1 greater or equal than col2
m1 = ~df['col1'].ge(df['col2'])
# not col2 greater or equal than col3
m2 = ~df['col2'].ge(df['col3'])
out = df[m1&m2]
Output:
index col1 col2 col3
1 1 2022-09-25 21:26:00 NaT NaT
2 2 2022-09-25 21:00:00 2022-09-25 22:00:00 2022-09-25 23:00:00
4 4 2022-09-25 21:00:00 2022-09-25 23:00:00 NaT
CodePudding user response:
My own approach is as follows:
import datetime
df['diff_1'] = pd.to_datetime(df['col_2']) - pd.to_datetime(df['col_1'])
df['diff_1'] = df['diff_1'].dt.total_seconds()
m1 = (df['diff_1'] > 0) | (clean_json['diff_1'].isnull())
df['diff_2'] = pd.to_datetime(df['col_3']) - pd.to_datetime(clean_json['col_2'])
df['diff_2'] = df['diff_2'].dt.total_seconds()
m2 = (df['diff_2'] > 0) | (df['diff_2'].isnull())
out = df[m1 & m2]