Home > database >  How to create a binary variable based on date ranges
How to create a binary variable based on date ranges

Time:11-10

I would like to flag all rows with dates 1 week before and 1 week after a specific holiday to be = 1; = 0 otherwise.

What's the best way to do so? Below are my codes, which only flag New Year's Day to be new_year = 1. What I want is all 3 rows to have new_year = 1 (since they fall within 1 week before and after New Year's Day).

Note: I would like the code to work for any holidays (e.g. Thanksgiving, Easter, etc.).

Thank you!

# importing pandas as pd
import pandas as pd
import holidays

# Creating the dataframe
df = pd.DataFrame({'Date': ['1/1/2019', '1/5/2019', '12/28/2018'],
                   'Event': ['Music', 'Poetry', 'Theatre'],
                   'Cost': [10000, 5000, 15000]})
df['newDate'] = pd.to_datetime(df['Date'], format='%m/%d/%Y')

new_year = holidays.HolidayBase()

new_year.append({"2018-01-01": "New Year's Day",
                 "2019-01-01": "New Year's Day"})

df['hol_new_year'] = np.where(df['newDate'] in new_year, 1, 0)

CodePudding user response:

You can use pandas' time series offsets:

ye = pd.tseries.offsets.YearEnd()
yb = pd.tseries.offsets.YearBegin()
d = pd.to_timedelta('1w')
s = df['newDate']

df['hol_new_year'] = (s.between(s-ye-d, s-ye d)
                     |s.between(s yb-d, s yb d)
                      ).astype(int)

Output:

         Date    Event   Cost    newDate  hol_new_year
0    1/1/2019    Music  10000 2019-01-01             1
1    1/5/2019   Poetry   5000 2019-01-05             1
2  12/28/2018  Theatre  15000 2018-12-28             1
3   1/15/2021       SO      0 2021-01-15             0
  • Related