I'm trying to move an arbitrary input pd.Timestamp
back to the correct start offset unless it is already on offset.
The following code works with BusinessMonthEnd
, MonthEnd
, MonthStart
, and so on:
import pandas as pd
from pandas.tseries.offsets import *
def to_start(t: pd.Timestamp, freq: pd.DateOffset) -> pd.Timestamp:
return freq.rollback(t)
assert to_start(pd.to_datetime("2021-09-27"), MonthEnd()) == pd.to_datetime("2021-08-31")
assert to_start(pd.to_datetime("2021-09-27"), MonthBegin()) == pd.to_datetime("2021-08-31")
assert to_start(pd.to_datetime("2021-08-27"), BMonthEnd()) == pd.to_datetime("2021-07-30")
assert to_start(pd.to_datetime("2021-08-27"), YearBegin()) == pd.to_datetime("2021-01-01")
assert to_start(pd.to_datetime("2021-08-27"), BYearBegin()) == pd.to_datetime("2021-01-01")
# also it works nicely with holiday calendars
from pandas.tseries.holiday import USFederalHolidayCalendar
us_fed_biz_days = CustomBusinessDay(calendar=USFederalHolidayCalendar())
memorial_day = pd.to_datetime("2021-05-31")
the_friday_before_memorial_day = pd.to_datetime("2021-05-28")
assert to_start(memorial_day, us_fed_biz_days) == the_friday_before_memorial_day
However (and this is driving me crazy) it doesn't seem to work for Day
, BusinessDay
, Week
, Hour
etc:
assert to_start(pd.to_datetime("2021-08-27 05:00"), Day()) == pd.to_datetime("2021-08-27")
assert to_start(pd.to_datetime("2021-08-27 05:00"), BDay()) == pd.to_datetime("2021-08-27")
assert to_start(pd.to_datetime("2021-08-27 05:15"), Hour()) == pd.to_datetime("2021-08-27 05:00")
assert to_start(pd.to_datetime("2021-08-26"), pd.tseries.frequencies.to_offset("W-MON")) == pd.to_datetime("2021-08-24")
I have also tried this:
def to_start(t: pd.Timestamp, freq: pd.DateOffset) -> pd.Timestamp:
return pd.Period(t, freq=freq).start_time
which ironically works with the second set of assertions but not the first?
Is my expectation of the above assertions not failing unreasonable, and if so what am I missing, please?
CodePudding user response:
From your examples, I take it that you want to use fixed frequencies (e.g. an hour) to floor the given date/time while variable frequencies (e.g. business month end) should call .rollback
if the check given in .is_on_offset
returns True (see source code linked in the questions' comment section).
Ex:
def to_start(t: pd.Timestamp, freq: pd.DateOffset) -> pd.Timestamp:
try:
return t.floor(freq) # fixed frequencies should just floor the date/time
except ValueError: # if freq is variable, we fall into here...
return freq.rollback(t.floor("D"))
Tests:
# variable offsets that depend on the date
assert to_start(pd.to_datetime("2021-09-27"), pd.tseries.offsets.MonthEnd()) == pd.to_datetime("2021-08-31")
assert to_start(pd.to_datetime("2021-09-27"), pd.tseries.offsets.MonthBegin()) == pd.to_datetime("2021-09-01")
assert to_start(pd.to_datetime("2021-08-27"), pd.tseries.offsets.BMonthEnd()) == pd.to_datetime("2021-07-30")
assert to_start(pd.to_datetime("2021-08-27"), pd.tseries.offsets.YearBegin()) == pd.to_datetime("2021-01-01")
assert to_start(pd.to_datetime("2021-08-27"), pd.tseries.offsets.BYearBegin()) == pd.to_datetime("2021-01-01")
assert to_start(pd.to_datetime("2021-08-26"), pd.tseries.frequencies.to_offset("W-MON")) == pd.to_datetime("2021-08-23")
assert to_start(pd.to_datetime("2021-08-28 05:00"), pd.tseries.offsets.BDay()) == pd.to_datetime("2021-08-27")
# fixed offsets (e.g. an hour is always an hour)
assert to_start(pd.to_datetime("2021-08-27 05:00"), pd.tseries.offsets.Day()) == pd.to_datetime("2021-08-27")
assert to_start(pd.to_datetime("2021-08-27 05:15"), pd.tseries.offsets.Hour()) == pd.to_datetime("2021-08-27 05:00")