We need to return the list of dates for n consecutive business days ( Friday to Monday is 1 business day ). where values are not changed. Do not assume that dates column have every single dates. Data frame structure would be as given below
date Value
2022-07-19 44.43000000
2022-07-20 44.43000000
2022-07-21 44.43000000
2022-07-22 44.43000000
2022-07-25 44.43000000
... ...
2022-09-02 86.40000000
2022-09-06 85.13000000
2022-09-07 86.86000000
2022-09-08 88.44000000
2022-09-09 89.44000000
If we assume n is 5. We need to return list of 5 consecutive dates. For above examples answer would be
[2022-07-22,2022-07-20,2022-07-21,2022-07-22,2022-07-25]
I tried below code to get consecutive dates present in data frame but I am unable to get consecutive business days.
for k, v in px_dirty.groupby((px_dirty['value'].shift() != px_dirty['value']).cumsum()):
if len(v) == 5:
print(f'[group {k}]')
print(v)
I am not able to figure out how to get consecutive business days.
CodePudding user response:
Use a date offset
:
from pandas.tseries.offsets import BDay
df['date'] = pd.to_datetime(df['date'])
# identify breaks in successive values
m1 = df['Value'].ne(df['Value'].shift())
# identify breaks in business days
m2 = df['date'].ne(df['date'].shift().add(BDay()))
# group by either break
for k,g in df.groupby((m1|m2).cumsum()):
if len(g) == 5:
print(f'[group {k}]')
print(g)
Output:
[group 1]
date Value
0 2022-07-19 44.43
1 2022-07-20 44.43
2 2022-07-21 44.43
3 2022-07-22 44.43
4 2022-07-25 44.43
Intermediates:
date Value m1 m2 m1|m2 group len
0 2022-07-19 44.43 True True True 1 5
1 2022-07-20 44.43 False False False 1 5
2 2022-07-21 44.43 False False False 1 5
3 2022-07-22 44.43 False False False 1 5
4 2022-07-25 44.43 False False False 1 5
5 2022-09-02 86.40 True True True 2 1
6 2022-09-06 85.13 True True True 3 1
7 2022-09-07 86.86 True False True 4 1
8 2022-09-08 88.44 True False True 5 1
9 2022-09-09 89.44 True False True 6 1
CodePudding user response:
First, create a field with business days that is the product of (weekday < 6) and (week number of the year). Then group and form lists with a length of at least 5 unique elements for each group
df = pd.DataFrame(pd.date_range('2022-01-01', '2022-03-01'))
df['BD'] = df[0].dt.weekday.lt(5).astype(int) * df[0].dt.isocalendar().week
df1 = df.groupby('BD').apply(lambda x: x[0].to_list() if len(set(x[0])) > 4 else None).dropna()
print(df1)
Prints:
BD
1 [2022-01-03 00:00:00, 2022-01-04 00:00:00, 202...
2 [2022-01-10 00:00:00, 2022-01-11 00:00:00, 202...
3 [2022-01-17 00:00:00, 2022-01-18 00:00:00, 202...
4 [2022-01-24 00:00:00, 2022-01-25 00:00:00, 202...
5 [2022-01-31 00:00:00, 2022-02-01 00:00:00, 202...
6 [2022-02-07 00:00:00, 2022-02-08 00:00:00, 202...
7 [2022-02-14 00:00:00, 2022-02-15 00:00:00, 202...
8 [2022-02-21 00:00:00, 2022-02-22 00:00:00, 202...
dtype: object