I want to compare a timestamp
datatype datetime64[ns]
with a datetime.date
I only want a comparison based on day and month
df
timestamp last_price
0 2023-01-22 14:15:06.033314 100.0
1 2023-01-25 14:15:06.213591 101.0
2 2023-01-30 14:15:06.313554 102.0
3 2023-03-31 14:15:07.018540 103.0
cu_date = datetime.datetime.now().date()
cu_year = cu_date.year
check_end_date = datetime.datetime.strptime(f'{cu_year}-11-05', '%Y-%m-%d').date()
check_start_date = datetime.datetime.strptime(f'{cu_year}-03-12', '%Y-%m-%d').date()
# this is incorrect as the day can be greater than check_start_date while the month might be less.
daylight_off_df = df.loc[((df.timestamp.dt.month >= check_end_date.month) & (df.timestamp.dt.day >= check_end_date.day)) |
((df.timestamp.dt.month <= check_start_date.month) & (df.timestamp.dt.day <= check_start_date.day))]
daylight_on_df = df.loc[((df.timestamp.dt.month <= check_end_date.month) & (df.timestamp.dt.day <= check_end_date.day)) &
((df.timestamp.dt.month >= check_start_date.month) & (df.timestamp.dt.day >= check_start_date.day))]
I am trying to think up of the logic to do this, but failing.
Expected output:
daylight_off_df
timestamp last_price
0 2023-01-22 14:15:06.033314 100.0
1 2023-01-25 14:15:06.213591 101.0
2 2023-01-30 14:15:06.313554 102.0
daylight_on_df
timestamp last_price
3 2023-03-31 14:15:07.018540 103.0
In summation separate the dataframe as per day and month comparison while ignoring the year.
CodePudding user response:
I would break out these values then just query
df['day'] = df['timestamp'].dt.day_name()
df['month'] = df['timestamp'].dt.month_name()
then whatever you're looking for:
df.groupby('month').mean()
CodePudding user response:
The following parameters could be helpfull if you dont want an additional column in your table:
check_end_date.timetuple().tm_yday # returns day of the year
#output 309
check_start_date.timetuple().tm_yday
#output 71
df['timestamp'].dt.is_leap_year.astype(int) #returns 1 if year is a leapyear
#output 0 | 1
df['timestamp'].dt.dayofyear #returns day of the year
#output
#0 22
#1 25
#2 30
#3 90
df['timestamp'].dt.dayofyear.between(a,b) #returns true if day is between a,b
there are some possible solutions now. i think using between can be the nicest looking one.
daylight_on_df4 = df.loc[df['timestamp'].dt.dayofyear.between(
check_start_date.timetuple().tm_yday df['timestamp'].dt.is_leap_year.astype(int),
check_end_date.timetuple().tm_yday df['timestamp'].dt.is_leap_year.astype(int))]
daylight_off_df4 = df.loc[~df['timestamp'].dt.dayofyear.between(
check_start_date.timetuple().tm_yday df['timestamp'].dt.is_leap_year.astype(int),
check_end_date.timetuple().tm_yday df['timestamp'].dt.is_leap_year.astype(int))]
or the code could look like this:
daylight_on_df3 = df.loc[((check_end_date.timetuple().tm_yday df['timestamp'].dt.is_leap_year.astype(int)) - df['timestamp'].dt.dayofyear > 0)
& (df['timestamp'].dt.dayofyear - (df['timestamp'].dt.is_leap_year.astype(int) check_start_date.timetuple().tm_yday) > 0)]
daylight_off_df3 = df.loc[((check_end_date.timetuple().tm_yday df['timestamp'].dt.is_leap_year.astype(int)) - df['timestamp'].dt.dayofyear < 0)
| (df['timestamp'].dt.dayofyear - (check_start_date.timetuple().tm_yday df['timestamp'].dt.is_leap_year.astype(int)) < 0)]
All daylight_on/off are doing now is checking if the day of the year is inbetween your ranges or not (inclusive leap year). This formula has probably to be rewritten if your startdate / enddate would cross a year (ex 2022-11-19 , 2023-02-22) but i think it provides a general idea.