Home > Software design >  how to label a specific date in pandas, and the other n days before and after it?
how to label a specific date in pandas, and the other n days before and after it?

Time:09-27

I got a question about how to label specific days and other days close to it. I have two pandas dataframes, one with Canadian holidays like the one below:

ds          holiday
2019-12-25  Christmas Day
2019-12-26  Boxing Day
2020-01-01  New Year's Day
2020-02-17  Family Day
2020-04-10  Good Friday

and another data frame like the one below:

ds                  y
2020-01-01          5      
2020-01-02          941    
2020-01-03          633   
2020-01-04          633    
2020-01-05          633    
2020-01-06          633
.                   .      
.                   .      
.                   .          

I want to create a new column in my second dataframe, in which I label as 1 the days in common and also the 3 days before and after the dates in common. In poor words, I want to label all the holidays in my df and also the 3 days before and after the holidays.

I would like something like this as output:

ds                  y     label
2020-01-01          5      1
2020-01-02          941    1
2020-01-03          633    1
2020-01-04          633    1
2020-01-05          633    0
2020-01-06          633    0
.                   .      .
.                   .      .
.                   .      .

I was wondering how I can do it.

Thanks for your help and have a great day!

CodePudding user response:

Here is a sample solution:

import pandas as pd
import numpy as np
import datetime
df1 = pd.DataFrame({'ds': ['2019-12-25', '2019-12-26', '2020-01-01', '2020-02-17', '2020-04-10'], 
'holiday': ["Christmas Day", 'Boxing Day' ,"New Year's Day" ,"Family Day" ,"Good Friday"]}    )

df2 = pd.DataFrame({'ds': ['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04', '2020-01-05', '2020-01-06'],
 'y': [5, 941, 633, 633, 633, 633]})

df1['ds'] = df1['ds'].astype('datetime64')
df2['ds'] = df2['ds'].astype('datetime64')


#convert to dic for fast look up  - 3 days
dic = {}
for i in range(df1.shape[0]):
    dic[df1['ds'][i]] = 1
    dic[df1['ds'][i]   datetime.timedelta(days=-3)] = 1
    dic[df1['ds'][i]   datetime.timedelta(days=-2)] = 1
    dic[df1['ds'][i]   datetime.timedelta(days=-1)] = 1
    dic[df1['ds'][i]   datetime.timedelta(days=1)] = 1
    dic[df1['ds'][i]   datetime.timedelta(days=2)] = 1
    dic[df1['ds'][i]   datetime.timedelta(days=3)] = 1
    

df2['label'] = np.zeros(df2.shape[0], dtype=np.int32)
for i in range(df2.shape[0]):
    if df2['ds'][i] in dic:
        df2['label'][i] = 1

CodePudding user response:

This worked fine for me

def get_holiday_range(df, day_range):

    df['holiday_range'] = 0

    for holiday in holidays['ds'].values : 

        holiday_range = np.arange(holiday - np.timedelta64(day_range,'D'), holiday   np.timedelta64(day_range,'D'), np.timedelta64(1, "D"))

        df.loc[df.call_datetime_date.isin(holiday_range), 'holiday_range']  = 1
        
    return df
  • Related