Home > Net >  Filter rows in a dataframe between two dates
Filter rows in a dataframe between two dates

Time:09-13

I need to filter rows in a dataframe, based on the value of the "date" column. This is a sample of such a column:

2022-09-08 09:00:00-05
2022-09-08 12:00:00-05
2022-09-08 15:00:00-05
2022-09-08 18:00:00-05
2022-09-09 09:00:00-05
2022-09-09 12:00:00-05
2022-09-09 15:00:00-05
2022-09-09 18:00:00-05
2022-09-10 09:00:00-05
2022-09-10 12:00:00-05
2022-09-10 15:00:00-05

Dtype is set to:

datetime64[ns, UTC-05:00]

The "end_date" is determined by the max value of the "date" column, and the "start_date" is the "end_date" minus 14 days. I don't really need the time part, so I've removed it using .date()

data_df = pd.DataFrame(data=data, columns=columns)
end_date = pd.to_datetime(data_df['date'].max().date())
start_date = pd.to_datetime(end_date - timedelta(days=14))
mask = (data_df['date'] > start_date) & (data_df['date'] <= end_date)
filtered_df = data_df.loc[mask]

When I run the code, I get the following error:

TypeError: Cannot compare tz-naive and tz-aware datetime-like objects TypeError: Invalid comparison between dtype=datetime64[ns, UTC-05:00] and Timestamp

How can I fix this?

CodePudding user response:

# Convert the whole column to datetime [If necessary]
# df.date = pd.to_datetime(df.date)

# Get max date
max_date = df.date.max().date()

# Here I use '1d' (1 day) for an interesting result.
# You'll use '14d' instead.
mask = df.date.dt.date.between(max_date - pd.to_timedelta('1d'), max_date)
f_df = df.loc[mask]
print(f_df)

# Output:
                        date
4  2022-09-09 09:00:00-05:00
5  2022-09-09 12:00:00-05:00
6  2022-09-09 15:00:00-05:00
7  2022-09-09 18:00:00-05:00
8  2022-09-10 09:00:00-05:00
9  2022-09-10 12:00:00-05:00
10 2022-09-10 15:00:00-05:00

TLDR;

Both times you do data_df['date'] should be data_df['date'].dt.date instead if you want to use your current format.

  • Related