I want to create a new column in pandas data frame 'day_after_long_weekend' based on the conditions shown in the image.
Sat and Sun are default holidays
- condition 1 - if Fri is a holiday (national/provincial) then Monday is the day after long weekend.
- condition 2 - if Fri and following Monday is a holiday then Tuesday is the day after long weekend.
- condition 3 - if Mon and Tuesday is a holiday then Wednesday will be 1, etc.
df = pd.DataFrame([[0, 1, 'Fri'], [0, 0, 'Sat'], [0, 0, 'Sun'], [0, 0, 'Mon'],[0, 0, 'Tue'],[0, 0, 'Wed'],[0, 0, 'Thu'],[0, 1, 'Fri'],[0, 0, 'Sat'],[0, 0, 'Sun'],[1, 0, 'Mon'],[0, 0, 'Tue']] , columns=['national_holiday', 'provincial_holiday','day_of week'])
How can I do this? Any help will be appreciated.
CodePudding user response:
Use masks and a rolling
sum:
# is a weekend?
m1 = df['day_of week'].isin(['Sat', 'Sun'])
# is a holiday?
m2 = df[['national_holiday', 'provincial_holiday']].eq(1).any(axis=1)
# weekend or holiday
m = (m1|m2)
# is there 3 weekend days or holidays in the last 3 days
# and today is a working day?
df['day_after_long_weekend'] = m.rolling(3).sum().shift().eq(3) & ~m
Output:
Code
In [19]:
m1 = df['day_of week'].isin(['Sat', 'Sun'])
m2 = df[['national_holiday', 'provincial_holiday']].eq(1).any(axis=1)
m = (m1|m2)
df['day_after_long_weekend'] = (m.rolling(3).sum().shift().eq(3) & ~m).astype(int)
print(df)
national_holiday provincial_holiday day_of week day_after_long_weekend
0 0 1 Fri 0
1 0 0 Sat 0
2 0 0 Sun 0
3 0 0 Mon 1
4 0 0 Tue 0
5 0 0 Wed 0
6 0 0 Thu 0
7 0 1 Fri 0
8 0 0 Sat 0
9 0 0 Sun 0
10 1 0 Mon 0
11 0 0 Tue 1
CodePudding user response:
I first added columns here for is_weekend
and is_holiday
like so:
df['is_weekend'] = df['day_of_week'].isin(['Saturday','Sunday'])
df['is_holiday'] = df['national_holiday'] | df['provincial_holiday'] | df['is_weekend'] # i set up the columns to be True/False instead of 1/0, so check that if you have an issue
So now we have df['is_holiday']
that is True/False
for if work is cancelled that day.
Now we want to find days where: (1) it is not a holiday, (2) yesterday was a holiday, and (3) there were more than two holidays in a row. We can do that like this:
df['long_weekend'] = (df['holiday'] == False) & (df['holiday'].shift(1) == True) & ((df.groupby((df['holiday'] != df['holiday'].shift(1)).cumsum()).cumcount() 1).shift(1) > 2)
I did pull the logic for the cumulative count of previous days from this other SO question/answer: Pandas: conditional rolling count