Home > Software design >  How to calculate number of days until weekend or day off in pandas dataframe
How to calculate number of days until weekend or day off in pandas dataframe

Time:06-08

I have pandas dataframe with a non-continuous date index (missing are weekends and holidays). I want to add column which would contain number of days until next day off.

Here is code generating example dataframe with desired values in till_day_off column:

import pandas as pd
​
df = pd.DataFrame(index=pd.date_range(start="2022-06-06", periods=15))
df["day_of_week"] = df.index.dayofweek   # adding column with number of day in a week
df = df[(df.day_of_week < 5)]   # remove weekends
df = df.drop(index="2022-06-15")   # remove Wednesday in second week
df["till_day_off"] = [5,4,3,2,1,2,1,2,1,1] # desired values, end of column is treated as day off

Resulting dataframe:

day_of_week till_day_off
2022-06-06 0 5
2022-06-07 1 4
2022-06-08 2 3
2022-06-09 3 2
2022-06-10 4 1
2022-06-13 0 2
2022-06-14 1 1
2022-06-16 3 2
2022-06-17 4 1
2022-06-20 0 1

Real dataframe has over 7_000 rows so obviously I am trying to avoid iteration over rows. Any idea how to tackle the issue?

CodePudding user response:

Assuming a sorted input (if not, sort it by days), you can use a mask to identify consecutive days and use it to group them and compute a cumcount:

mask = (-df.index.to_series().diff(-1)).eq('1d').iloc[::-1]
# reversing the Series to count until (not since) the value

df['till_day_off'] = mask.groupby((~mask).cumsum()).cumcount().add(1)

output:

            day_of_week  till_day_off
2022-06-06            0             5
2022-06-07            1             4
2022-06-08            2             3
2022-06-09            3             2
2022-06-10            4             1
2022-06-13            0             2
2022-06-14            1             1
2022-06-16            3             2
2022-06-17            4             1
2022-06-20            0             1

intermediates:

mask

2022-06-20    False
2022-06-17    False
2022-06-16     True
2022-06-14    False
2022-06-13     True
2022-06-10    False
2022-06-09     True
2022-06-08     True
2022-06-07     True
2022-06-06     True
dtype: bool

(~mask).cumsum()

2022-06-20    1
2022-06-17    2
2022-06-16    2
2022-06-14    3
2022-06-13    3
2022-06-10    4
2022-06-09    4
2022-06-08    4
2022-06-07    4
2022-06-06    4
dtype: int64

CodePudding user response:

If I understand your question correctly, you could use apply

def calc_to_days_off(day):
    #do your calculation here

df["till_day_off"] = df["day_of_week"].apply(calc_to_days_off)

CodePudding user response:

Create a DataFrame of the missing dates, then use an asof merge to match with the closest one in the future and calculate the time until that day off.

import pandas as pd

# DataFrame of missing dates, e.g. days off
df1 = pd.DataFrame({'day_off': pd.date_range('2022-06-06', periods=200)})
df1 = df1[~df1['day_off'].isin(df.index)]

df = pd.merge_asof(df, df1, left_index=True, right_on='day_off', direction='forward')
df['till_day_off'] = (df['day_off'] - df.index).dt.days

print(df)

            day_of_week    day_off  till_day_off
2022-06-06            0 2022-06-11             5
2022-06-07            1 2022-06-11             4
2022-06-08            2 2022-06-11             3
2022-06-09            3 2022-06-11             2
2022-06-10            4 2022-06-11             1
2022-06-13            0 2022-06-15             2
2022-06-14            1 2022-06-15             1
2022-06-16            3 2022-06-18             2
2022-06-17            4 2022-06-18             1
2022-06-20            0 2022-06-21             1
  • Related