Home > Blockchain >  Creating day after long weekend flag in pandas
Creating day after long weekend flag in pandas

Time:10-19

I want to create a new column in pandas data frame 'day_after_long_weekend' based on the conditions shown in the image.

Sat and Sun are default holidays

  • condition 1 - if Fri is a holiday (national/provincial) then Monday is the day after long weekend.
  • condition 2 - if Fri and following Monday is a holiday then Tuesday is the day after long weekend.
  • condition 3 - if Mon and Tuesday is a holiday then Wednesday will be 1, etc.

df = pd.DataFrame([[0, 1, 'Fri'], [0, 0, 'Sat'], [0, 0, 'Sun'], [0, 0, 'Mon'],[0, 0, 'Tue'],[0, 0, 'Wed'],[0, 0, 'Thu'],[0, 1, 'Fri'],[0, 0, 'Sat'],[0, 0, 'Sun'],[1, 0, 'Mon'],[0, 0, 'Tue']] , columns=['national_holiday', 'provincial_holiday','day_of week'])

How can I do this? Any help will be appreciated.

enter image description here

CodePudding user response:

Use masks and a rolling sum:

# is a weekend?
m1 = df['day_of week'].isin(['Sat', 'Sun'])
# is a holiday?
m2 = df[['national_holiday', 'provincial_holiday']].eq(1).any(axis=1)

# weekend or holiday
m = (m1|m2)

# is there 3 weekend days or holidays in the last 3 days
# and today is a working day?
df['day_after_long_weekend'] = m.rolling(3).sum().shift().eq(3) & ~m

Output:

Code
 
In [19]:
m1 = df['day_of week'].isin(['Sat', 'Sun'])
​
m2 = df[['national_holiday', 'provincial_holiday']].eq(1).any(axis=1)
​
m = (m1|m2)
​
df['day_after_long_weekend'] = (m.rolling(3).sum().shift().eq(3) & ~m).astype(int)
​
print(df)
    national_holiday  provincial_holiday day_of week  day_after_long_weekend
0                  0                   1         Fri                       0
1                  0                   0         Sat                       0
2                  0                   0         Sun                       0
3                  0                   0         Mon                       1
4                  0                   0         Tue                       0
5                  0                   0         Wed                       0
6                  0                   0         Thu                       0
7                  0                   1         Fri                       0
8                  0                   0         Sat                       0
9                  0                   0         Sun                       0
10                 1                   0         Mon                       0
11                 0                   0         Tue                       1

CodePudding user response:

I first added columns here for is_weekend and is_holiday like so:

df['is_weekend'] = df['day_of_week'].isin(['Saturday','Sunday'])
df['is_holiday'] = df['national_holiday'] | df['provincial_holiday'] | df['is_weekend'] # i set up the columns to be True/False instead of 1/0, so check that if you have an issue

So now we have df['is_holiday'] that is True/False for if work is cancelled that day.

Now we want to find days where: (1) it is not a holiday, (2) yesterday was a holiday, and (3) there were more than two holidays in a row. We can do that like this:

df['long_weekend'] = (df['holiday'] == False) & (df['holiday'].shift(1) == True) & ((df.groupby((df['holiday'] != df['holiday'].shift(1)).cumsum()).cumcount() 1).shift(1) > 2)

I did pull the logic for the cumulative count of previous days from this other SO question/answer: Pandas: conditional rolling count

  • Related