I have Time Off data per Employee. I need to split it by Week (Monday - Sunday), but before that, I need to calculate WORKING Days Off and Hours per WORKING Day Off, so if Time Off starts in the middle of the week (Wednesday for example), we will know, that only Hours for 3 working days (Wednesday, Thursday and Friday) will be assigned to that week.
Id Name Date Start Date End Time Off Hours
1 Tom Holland 2022-04-22 2022-05-06 88.0
I was able to exclude Weekends and calculate Number of WORKING Days Off and Hours per WORKING Day Off.
test = {'Id': [1], 'Name': ['Tom Holland'], 'Date Start': ['2022-04-22'], 'Date End': ['2022-05-06'], 'Time Off Hours': [88.0]}
df = pd.DataFrame(data=test)
time_diff = []
for i in df.index:
time_diff.append(np.busday_count(df["Date Start"][i], df["Date End"][i], weekmask=[1,1,1,1,1,0,0]) 1)
df["Days Off (Working)"] = time_diff
df['Hours per Days Off (Working)'] = df["Time Off Hours"] / df["Days Off (Working)"]
The output is:
Id Name Date Start Date End Time Off Hours Days Off (Working) Hours per Days Off (Working)
1 Tom Holland 2022-04-22 2022-05-06 88.0 11 8.0
And now I need to split and then group this record in 3 records (in this case), because 2022-04-22 and 2022-05-06 date range falls in 3 weeks period (Monday to Sunday):
- from 2022-04-18 to 2022-04-24 week (1 Working Day Off = 8 Hours)
- from 2022-04-25 to 2022-05-01 week (5 Working Day Off = 40 Hours)
- from 2022-05-02 to 2022-05-08 week (5 Working Day Off = 40 Hours)
Desired output should look similar to that:
Id | Name | Week Start | Week End | Days Off (Working) | Hours per Days Off (Working) | Total Off Hours |
---|---|---|---|---|---|---|
1 | Tom Holland | 2022-04-18 | 2022-04-24 | 1 | 8.0 | 8.0 |
1 | Tom Holland | 2022-04-25 | 2022-05-01 | 5 | 8.0 | 40.0 |
1 | Tom Holland | 2022-05-02 | 2022-05-08 | 5 | 8.0 | 40.0 |
CodePudding user response:
This is not the most concise way of doing it, but it does the job. First I've created your example df
test = {'Id': [1], 'Name': ['Tom Holland'], 'Date Start': ['2022-04-22'], 'Date End': ['2022-05-06'], 'Time Off Hours': [88.0]}
df = pd.DataFrame(data=test)
Then I've created a helper function that will help me later to calculate the number of working days of each week based also using the Date Start
and Date End
in case they start or end before the week star and end
# You could try to use np.select to optimize this part
def get_work_days(row: pd.Series) -> int:
start = row["Date Start"]
end = row["Date End"]
week_start = row["Week Start"]
week_end = row["Week End"]
if week_start <= start <= week_end:
bdays = len(pd.bdate_range(start, week_end))
elif week_start <= end <= week_end:
bdays = len(pd.bdate_range(week_start, end))
elif week_start <= end <= week_end and week_start <= start <= week_end:
bdays = len(pd.bdate_range(start, end))
else:
bdays = len(pd.bdate_range(week_start, week_end))
return bdays
And finally the process part to return your desired output
def process_dataframe(df: pd.DataFrame) -> pd.DataFrame:
# Making sure that these columns are datetime
df["Date Start"] = pd.to_datetime(df["Date Start"])
df["Date End"] = pd.to_datetime(df["Date End"])
# Calculating Working days between date start and date end
df["Days Off (Working)"] = df.apply(lambda row: len(pd.bdate_range(row["Date Start"], row["Date End"])), axis=1)
df['Hours per Days Off (Working)'] = df["Time Off Hours"] / df["Days Off (Working)"]
# Creating Week Start values
df["Week Start"] = df.apply(
lambda row: pd.date_range(
start=row["Date Start"].to_period("W").start_time,
end=row["Date End"].to_period("W").start_time,
freq="7D"
),
axis=1
)
# Creating Week End values
df["Week End"] = df.apply(
lambda row: pd.date_range(
start=row["Date Start"].to_period("W").end_time,
end=row["Date End"].to_period("W").end_time,
freq="7D"
),
axis=1
)
# Exploding the values, since the way they were created made them as a DatetimeIndex
# field.
df = df.explode(["Week Start", "Week End"])
# Just did that because the Week End had a weird time due to .end_time
df["Week End"] = pd.to_datetime(df["Week End"].dt.date)
df["Week Start"] = pd.to_datetime(df["Week Start"].dt.date)
# Using the helper function to calculate the working days
df["Days Off (Working)"] = df.apply(get_work_days, axis=1)
df["Total Off Hours"] = df["Days Off (Working)"] * df["Hours per Days Off (Working)"]
return df[["Name", "Week Start", "Week End", "Days Off (Working)", "Hours per Days Off (Working)", "Total Off Hours"]]