The following is my code:
x = ts.loc[::-1, "validday"].eq(0)
x = x.groupby(x.index.to_period('M'), sort=False).cumsum().head(35)
x.head(35)
Current Output:
Date
2022-11-14 1
2022-11-13 1
2022-11-12 1
2022-11-11 2
2022-11-10 3
2022-11-09 4
2022-11-08 5
2022-11-07 6
2022-11-06 6
2022-11-05 7
2022-11-04 7
2022-11-03 8
2022-11-02 9
2022-11-01 10
2022-10-31 1
2022-10-30 1
2022-10-29 2
2022-10-28 3
2022-10-27 4
2022-10-26 4
2022-10-25 5
2022-10-24 5
. .
. .
. .
. .
2019-09-14 .
The reason is to detect the last valid day, second last valid day and third last valid day based on the available dataset
Desired output:
Date last valid day last second valid day last third valid day
2022-11-14 1 1 0 0
2022-11-13 1 0 0 0
2022-11-12 1 0 0 0
2022-11-11 2 0 1 0
2022-11-10 3 0 0 1
2022-11-09 4 0 0 0
2022-11-08 5 0 0 0
2022-11-07 6 0 0 0
2022-11-06 6 0 0 0
2022-11-05 7 0 0 0
2022-11-04 7 0 0 0
2022-11-03 8 0 0 0
2022-11-02 9 0 0 0
2022-11-01 10 0 0 0
2022-10-31 1 1 0 0
2022-10-30 1 0 0 0
2022-10-29 2 0 1 0
2022-10-28 3 0 0 1
.
.
.
2019-09-14 . . . .
Can someone please tell me, how can i achieve this goal?
CodePudding user response:
First select rows with no holidays, so possible use counter by GroupBy.cumcount
- for add missing holiday datetimes is use Series.reindex
:
mask = ts.loc[::-1, "public.holiday"].eq(0)
x = mask.groupby(x.index.to_period('M'), sort=False).cumsum()
#print (x)
s1 = x[mask]
s = (s1.groupby(s1.index.to_period('M'), sort=False).cumcount()
.reindex(x.index, fill_value=-1))
print (s)
Date
2022-11-14 0
2022-11-13 -1
2022-11-12 -1
2022-11-11 1
2022-11-10 2
2022-11-09 3
2022-11-08 4
2022-11-07 5
2022-11-06 -1
2022-11-05 6
2022-11-04 -1
2022-11-03 7
2022-11-02 8
2022-11-01 9
2022-10-31 0
2022-10-30 -1
2022-10-29 1
2022-10-28 2
2022-10-27 3
2022-10-26 -1
2022-10-25 4
2022-10-24 -1
dtype: int64
And then convert Series
to one column DataFrame
and compare by 0,1,2
with casting to integers for expected ouput:
out = x.to_frame().assign(**{'last Working day': s.eq(0).astype(int),
'last second working day': s.eq(1).astype(int),
'last third working day': s.eq(2).astype(int)})
print (out.head())
public.holiday last Working day last second working day \
Date
2022-11-14 1 1 0
2022-11-13 1 0 0
2022-11-12 1 0 0
2022-11-11 2 0 1
2022-11-10 3 0 0
last third working day
Date
2022-11-14 0
2022-11-13 0
2022-11-12 0
2022-11-11 0
2022-11-10 1
print (out.tail(8))
public.holiday last Working day last second working day \
Date
2022-10-31 1 1 0
2022-10-30 1 0 0
2022-10-29 2 0 1
2022-10-28 3 0 0
2022-10-27 4 0 0
2022-10-26 4 0 0
2022-10-25 5 0 0
2022-10-24 5 0 0
last third working day
Date
2022-10-31 0
2022-10-30 0
2022-10-29 0
2022-10-28 1
2022-10-27 0
2022-10-26 0
2022-10-25 0
2022-10-24 0