Home > Software design >  Marking valid days on time series
Marking valid days on time series

Time:01-04

The following is my code:

x = ts.loc[::-1, "validday"].eq(0)
x = x.groupby(x.index.to_period('M'), sort=False).cumsum().head(35)
x.head(35)

Current Output:

    Date
    2022-11-14     1
    2022-11-13     1
    2022-11-12     1
    2022-11-11     2
    2022-11-10     3
    2022-11-09     4
    2022-11-08     5
    2022-11-07     6
    2022-11-06     6
    2022-11-05     7
    2022-11-04     7
    2022-11-03     8
    2022-11-02     9
    2022-11-01     10
    2022-10-31     1
    2022-10-30     1
    2022-10-29     2
    2022-10-28     3
    2022-10-27     4
    2022-10-26     4
    2022-10-25     5
    2022-10-24     5
       .           .
       .           .
       .           .
       .           .
    2019-09-14     .

The reason is to detect the last valid day, second last valid day and third last valid day based on the available dataset

Desired output:

       Date               last valid day   last second valid day   last third valid day
    2022-11-14     1        1                        0                        0
    2022-11-13     1        0                        0                        0
    2022-11-12     1        0                        0                        0
    2022-11-11     2        0                        1                        0
    2022-11-10     3        0                        0                        1
    2022-11-09     4        0                        0                        0
    2022-11-08     5        0                        0                        0
    2022-11-07     6        0                        0                        0
    2022-11-06     6        0                        0                        0
    2022-11-05     7        0                        0                        0
    2022-11-04     7        0                        0                        0
    2022-11-03     8        0                        0                        0
    2022-11-02     9        0                        0                        0
    2022-11-01     10       0                        0                        0
    2022-10-31     1        1                        0                        0
    2022-10-30     1        0                        0                        0
    2022-10-29     2        0                        1                        0
    2022-10-28     3        0                        0                        1
        .
        .
        .
    2019-09-14     .        .                        .                        .

Can someone please tell me, how can i achieve this goal?

CodePudding user response:

First select rows with no holidays, so possible use counter by GroupBy.cumcount - for add missing holiday datetimes is use Series.reindex:

mask = ts.loc[::-1, "public.holiday"].eq(0)
x = mask.groupby(x.index.to_period('M'), sort=False).cumsum()
#print (x)

s1 = x[mask]

s = (s1.groupby(s1.index.to_period('M'), sort=False).cumcount()
       .reindex(x.index, fill_value=-1))
print (s)
Date
2022-11-14    0
2022-11-13   -1
2022-11-12   -1
2022-11-11    1
2022-11-10    2
2022-11-09    3
2022-11-08    4
2022-11-07    5
2022-11-06   -1
2022-11-05    6
2022-11-04   -1
2022-11-03    7
2022-11-02    8
2022-11-01    9
2022-10-31    0
2022-10-30   -1
2022-10-29    1
2022-10-28    2
2022-10-27    3
2022-10-26   -1
2022-10-25    4
2022-10-24   -1
dtype: int64

And then convert Series to one column DataFrame and compare by 0,1,2 with casting to integers for expected ouput:

out = x.to_frame().assign(**{'last Working day': s.eq(0).astype(int),
                              'last second working day': s.eq(1).astype(int),
                              'last third working day': s.eq(2).astype(int)})
print (out.head())
            public.holiday  last Working day  last second working day  \
Date                                                                    
2022-11-14               1                 1                        0   
2022-11-13               1                 0                        0   
2022-11-12               1                 0                        0   
2022-11-11               2                 0                        1   
2022-11-10               3                 0                        0   

            last third working day  
Date                                
2022-11-14                       0  
2022-11-13                       0  
2022-11-12                       0  
2022-11-11                       0  
2022-11-10                       1 

print (out.tail(8))
            public.holiday  last Working day  last second working day  \
Date                                                                    
2022-10-31               1                 1                        0   
2022-10-30               1                 0                        0   
2022-10-29               2                 0                        1   
2022-10-28               3                 0                        0   
2022-10-27               4                 0                        0   
2022-10-26               4                 0                        0   
2022-10-25               5                 0                        0   
2022-10-24               5                 0                        0   

            last third working day  
Date                                
2022-10-31                       0  
2022-10-30                       0  
2022-10-29                       0  
2022-10-28                       1  
2022-10-27                       0  
2022-10-26                       0  
2022-10-25                       0  
2022-10-24                       0  
  • Related