I have the following dataframe:
import pandas as pd
import numpy as np
np.random.seed(0)
# create an array of 5 dates starting at '2015-02-24', one per minute
rng = pd.date_range('2021-07-29', periods=5, freq='D')
rng_1 = pd.date_range('2021-07-30', periods=5, freq='D')
rng_2 = pd.date_range('2021-07-31', periods=5, freq='D')
df_status = ['received', 'send', 'received', 'send', 'send']
df = pd.DataFrame({ 'Date': rng, 'Date_1': rng_1, 'Date_2': rng_2, 'status': df_status })
print(df)
I would like to print out all the rows if at least one column contains a date that is equal to, or at least 2021-08-01. What would be the most effective way to do this?
I have tried to do this with the following code, however, I get the following error:
start_date = '2022-08-01'
start_date = pd.to_datetime(start_date, format="%Y/%m/%d")
mask = (df['Date'] >= start_date | df['Date_1'] >= start_date | df['Date_3'] >= start_date)
TypeError: unsupported operand type(s) for &: 'Timestamp' and 'DatetimeArray'
Thank you in advance.
CodePudding user response:
Use boolean indexing with any
:
df[df.ge('2021-08-01').any(1)]
output:
Date Date_1 Date_2
1 2021-07-30 2021-07-31 2021-08-01
2 2021-07-31 2021-08-01 2021-08-02
3 2021-08-01 2021-08-02 2021-08-03
4 2021-08-02 2021-08-03 2021-08-04
intermediate:
df.ge('2021-08-01').any(1)
0 False
1 True
2 True
3 True
4 True
dtype: bool
using only the date columns
filtering by name (Date
in the column name):
df[df.filter(like='Date').ge('2021-08-01').any(1)]
filtering by type:
df[df.select_dtypes('datetime64[ns]').ge('2021-08-01').any(1)]
CodePudding user response:
You may use any
inside apply
:
df[df.apply(lambda x: any([x[col] >= pd.to_datetime('2021-08-01') for col in df.columns]), axis=1)]