Home > database >  generate columns indicating pre and post holiday
generate columns indicating pre and post holiday

Time:09-21

I have a dataframe that has columns like these:

Date          temp_data        holiday              

01.01.2000    10000              0                
02.01.2000    0                  1                
03.01.2000    0                  1                
04.01.2000    0                  1
05.01.2000    0                  1
06.01.2000    23000              0
..
..
..
30.01.2000    200                0                
31.01.2000     0                 1                
01.02.2000     0                 1                 
02.02.2000    2500               0                

holiday = 0 when there is data present - indicates a working day

holiday = 1 when there is no data present - indicated a non-working day

I am trying to extract two new columns pre_long_holiday and post_long_holiday

the dataframe should look like this

 Date          temp_data      holiday   pre_long_hol   post_long_hol 

01.01.2000    10000              0                1            0
02.01.2000    0                  1                0            0
03.01.2000    0                  1                0            0
04.01.2000    0                  1                0            0
05.01.2000    0                  1                0            0
06.01.2000    23000              0                0            1
07.01.2000    2000               0                1            0
08.01.2000    0                  1                0            0
09.01.2000    0                  1                0            0
10.01.2000    0                  1                0            0
11.01.2000    1000               0                0            1
..
..
..
30.01.2000    200                0                0            0          
31.01.2000     0                 1                0            0
01.02.2000     0                 1                0            0
02.02.2000    2500               0                0            0

Long_holiday = holidays >=3 consecutive days pre and post columns has 1 before and after the holiday period

Can anyone help me with this?

The data I have is a continuous time series.

CodePudding user response:

If need set only one 1 before and after holiday use Series.rolling with sum and test shifted values:

N = 3
m = df['holiday'].eq(0)
s = df['holiday'].rolling(N).sum()
df['pre_long_hol'] =  (s.shift(-N).ge(N) & m).astype(int)
df['post_long_hol'] = (s.shift().ge(N) & m).astype(int)

print (df)
          Date  temp_data  holiday  pre_long_hol  post_long_hol
0   01.01.2000      10000        0             1              0
1   02.01.2000          0        1             0              0
2   03.01.2000          0        1             0              0
3   04.01.2000          0        1             0              0
4   05.01.2000          0        1             0              0
5   06.01.2000      23000        0             0              1
6   07.01.2000       2000        0             1              0
7   08.01.2000          0        1             0              0
8   09.01.2000          0        1             0              0
9   10.01.2000          0        1             0              0
10  11.01.2000       1000        0             0              1
11  30.01.2000        200        0             0              0
12  31.01.2000          0        1             0              0
13  01.02.2000          0        1             0              0
14  02.02.2000       2500        0             0              0

CodePudding user response:

There probably is a more efficient solution but here is what I came up with:

df['pre_holiday'] = 0

for i in range(len(df)):
    limit = len(df) - 3
    if i < limit:
        if df.iloc[i 1].holiday == 1 and df.iloc[i 2].holiday == 1 and df.iloc[i 3].holiday == 1 and df.iloc[i].quantity != 0:
            df.at[i, 'pre_holiday'] = 1

df['post_holiday'] = 0

for i in range(len(df)):
    limit = 3
    if i > limit:
        if df.iloc[i-1].holiday == 1 and df.iloc[i-2].holiday == 1 and df.iloc[i-3].holiday == 1 and df.iloc[i].quantity != 0:
            df.at[i, 'post_holiday'] = 1
  • Related