I have a dataframe that has columns like these:
Date temp_data holiday
01.01.2000 10000 0
02.01.2000 0 1
03.01.2000 0 1
04.01.2000 0 1
05.01.2000 0 1
06.01.2000 23000 0
..
..
..
30.01.2000 200 0
31.01.2000 0 1
01.02.2000 0 1
02.02.2000 2500 0
holiday = 0 when there is data present - indicates a working day
holiday = 1 when there is no data present - indicated a non-working day
I am trying to extract two new columns pre_long_holiday and post_long_holiday
the dataframe should look like this
Date temp_data holiday pre_long_hol post_long_hol
01.01.2000 10000 0 1 0
02.01.2000 0 1 0 0
03.01.2000 0 1 0 0
04.01.2000 0 1 0 0
05.01.2000 0 1 0 0
06.01.2000 23000 0 0 1
07.01.2000 2000 0 1 0
08.01.2000 0 1 0 0
09.01.2000 0 1 0 0
10.01.2000 0 1 0 0
11.01.2000 1000 0 0 1
..
..
..
30.01.2000 200 0 0 0
31.01.2000 0 1 0 0
01.02.2000 0 1 0 0
02.02.2000 2500 0 0 0
Long_holiday = holidays >=3 consecutive days pre and post columns has 1 before and after the holiday period
Can anyone help me with this?
The data I have is a continuous time series.
CodePudding user response:
If need set only one 1
before and after holiday use Series.rolling
with sum
and test shifted values:
N = 3
m = df['holiday'].eq(0)
s = df['holiday'].rolling(N).sum()
df['pre_long_hol'] = (s.shift(-N).ge(N) & m).astype(int)
df['post_long_hol'] = (s.shift().ge(N) & m).astype(int)
print (df)
Date temp_data holiday pre_long_hol post_long_hol
0 01.01.2000 10000 0 1 0
1 02.01.2000 0 1 0 0
2 03.01.2000 0 1 0 0
3 04.01.2000 0 1 0 0
4 05.01.2000 0 1 0 0
5 06.01.2000 23000 0 0 1
6 07.01.2000 2000 0 1 0
7 08.01.2000 0 1 0 0
8 09.01.2000 0 1 0 0
9 10.01.2000 0 1 0 0
10 11.01.2000 1000 0 0 1
11 30.01.2000 200 0 0 0
12 31.01.2000 0 1 0 0
13 01.02.2000 0 1 0 0
14 02.02.2000 2500 0 0 0
CodePudding user response:
There probably is a more efficient solution but here is what I came up with:
df['pre_holiday'] = 0
for i in range(len(df)):
limit = len(df) - 3
if i < limit:
if df.iloc[i 1].holiday == 1 and df.iloc[i 2].holiday == 1 and df.iloc[i 3].holiday == 1 and df.iloc[i].quantity != 0:
df.at[i, 'pre_holiday'] = 1
df['post_holiday'] = 0
for i in range(len(df)):
limit = 3
if i > limit:
if df.iloc[i-1].holiday == 1 and df.iloc[i-2].holiday == 1 and df.iloc[i-3].holiday == 1 and df.iloc[i].quantity != 0:
df.at[i, 'post_holiday'] = 1