Home > Software design >  Pandas - compare day and month only against a datetime?
Pandas - compare day and month only against a datetime?

Time:01-31

I want to compare a timestamp datatype datetime64[ns] with a datetime.date I only want a comparison based on day and month

df

                  timestamp  last_price
0 2023-01-22 14:15:06.033314     100.0
1 2023-01-25 14:15:06.213591     101.0
2 2023-01-30 14:15:06.313554     102.0
3 2023-03-31 14:15:07.018540     103.0

cu_date = datetime.datetime.now().date()
cu_year = cu_date.year
check_end_date = datetime.datetime.strptime(f'{cu_year}-11-05', '%Y-%m-%d').date()
check_start_date = datetime.datetime.strptime(f'{cu_year}-03-12', '%Y-%m-%d').date()

# this is incorrect as the day can be greater than check_start_date while the month might be less. 
daylight_off_df = df.loc[((df.timestamp.dt.month >= check_end_date.month) & (df.timestamp.dt.day >= check_end_date.day)) |
                             ((df.timestamp.dt.month <= check_start_date.month) & (df.timestamp.dt.day <= check_start_date.day))]
    daylight_on_df = df.loc[((df.timestamp.dt.month <= check_end_date.month) & (df.timestamp.dt.day <= check_end_date.day)) &
                            ((df.timestamp.dt.month >= check_start_date.month) & (df.timestamp.dt.day >= check_start_date.day))]

I am trying to think up of the logic to do this, but failing.

Expected output:

daylight_off_df

                  timestamp  last_price
0 2023-01-22 14:15:06.033314     100.0
1 2023-01-25 14:15:06.213591     101.0
2 2023-01-30 14:15:06.313554     102.0

daylight_on_df

                   timestamp  last_price
3 2023-03-31 14:15:07.018540     103.0

In summation separate the dataframe as per day and month comparison while ignoring the year.

CodePudding user response:

I would break out these values then just query

df['day'] = df['timestamp'].dt.day_name()
df['month'] = df['timestamp'].dt.month_name()

then whatever you're looking for:

df.groupby('month').mean()

CodePudding user response:

The following parameters could be helpfull if you dont want an additional column in your table:

check_end_date.timetuple().tm_yday # returns day of the year 
#output 309  
check_start_date.timetuple().tm_yday
#output 71
df['timestamp'].dt.is_leap_year.astype(int) #returns 1 if year is a leapyear
#output 0 | 1
df['timestamp'].dt.dayofyear #returns day of the year
#output 
#0    22
#1    25
#2    30
#3    90
df['timestamp'].dt.dayofyear.between(a,b) #returns true if day is between a,b

there are some possible solutions now. i think using between can be the nicest looking one.

daylight_on_df4 = df.loc[df['timestamp'].dt.dayofyear.between(
    check_start_date.timetuple().tm_yday   df['timestamp'].dt.is_leap_year.astype(int),
    check_end_date.timetuple().tm_yday   df['timestamp'].dt.is_leap_year.astype(int))]
daylight_off_df4 = df.loc[~df['timestamp'].dt.dayofyear.between(
    check_start_date.timetuple().tm_yday   df['timestamp'].dt.is_leap_year.astype(int),
    check_end_date.timetuple().tm_yday   df['timestamp'].dt.is_leap_year.astype(int))]

or the code could look like this:

daylight_on_df3 = df.loc[((check_end_date.timetuple().tm_yday   df['timestamp'].dt.is_leap_year.astype(int)) - df['timestamp'].dt.dayofyear > 0) 
                         & (df['timestamp'].dt.dayofyear - (df['timestamp'].dt.is_leap_year.astype(int)   check_start_date.timetuple().tm_yday) > 0)]
daylight_off_df3 = df.loc[((check_end_date.timetuple().tm_yday   df['timestamp'].dt.is_leap_year.astype(int)) - df['timestamp'].dt.dayofyear < 0) 
                          | (df['timestamp'].dt.dayofyear - (check_start_date.timetuple().tm_yday   df['timestamp'].dt.is_leap_year.astype(int)) < 0)]

All daylight_on/off are doing now is checking if the day of the year is inbetween your ranges or not (inclusive leap year). This formula has probably to be rewritten if your startdate / enddate would cross a year (ex 2022-11-19 , 2023-02-22) but i think it provides a general idea.

  • Related