Home > database >  Split Time Off Hours (Working) by Week (Monday - Sunday) in Python
Split Time Off Hours (Working) by Week (Monday - Sunday) in Python

Time:03-30

I have Time Off data per Employee. I need to split it by Week (Monday - Sunday), but before that, I need to calculate WORKING Days Off and Hours per WORKING Day Off, so if Time Off starts in the middle of the week (Wednesday for example), we will know, that only Hours for 3 working days (Wednesday, Thursday and Friday) will be assigned to that week.

Id     Name       Date Start    Date End   Time Off Hours

1  Tom Holland   2022-04-22    2022-05-06        88.0    

I was able to exclude Weekends and calculate Number of WORKING Days Off and Hours per WORKING Day Off.

test = {'Id': [1], 'Name': ['Tom Holland'], 'Date Start': ['2022-04-22'], 'Date End': ['2022-05-06'], 'Time Off Hours': [88.0]}

df = pd.DataFrame(data=test)

time_diff = []

for i in df.index:

    time_diff.append(np.busday_count(df["Date Start"][i], df["Date End"][i], weekmask=[1,1,1,1,1,0,0])   1)

df["Days Off (Working)"] = time_diff

df['Hours per Days Off (Working)'] = df["Time Off Hours"] / df["Days Off (Working)"]

The output is:

Id     Name       Date Start    Date End   Time Off Hours   Days Off (Working)  Hours per Days Off (Working)

1  Tom Holland   2022-04-22    2022-05-06        88.0            11                    8.0

And now I need to split and then group this record in 3 records (in this case), because 2022-04-22 and 2022-05-06 date range falls in 3 weeks period (Monday to Sunday):

  1. from 2022-04-18 to 2022-04-24 week (1 Working Day Off = 8 Hours)
  2. from 2022-04-25 to 2022-05-01 week (5 Working Day Off = 40 Hours)
  3. from 2022-05-02 to 2022-05-08 week (5 Working Day Off = 40 Hours)

Desired output should look similar to that:

Id Name Week Start Week End Days Off (Working) Hours per Days Off (Working) Total Off Hours
1 Tom Holland 2022-04-18 2022-04-24 1 8.0 8.0
1 Tom Holland 2022-04-25 2022-05-01 5 8.0 40.0
1 Tom Holland 2022-05-02 2022-05-08 5 8.0 40.0

CodePudding user response:

This is not the most concise way of doing it, but it does the job. First I've created your example df

test = {'Id': [1], 'Name': ['Tom Holland'], 'Date Start': ['2022-04-22'], 'Date End': ['2022-05-06'], 'Time Off Hours': [88.0]}
df = pd.DataFrame(data=test)

Then I've created a helper function that will help me later to calculate the number of working days of each week based also using the Date Start and Date End in case they start or end before the week star and end

# You could try to use np.select to optimize this part
def get_work_days(row: pd.Series) -> int:
  start = row["Date Start"]
  end = row["Date End"]
  week_start = row["Week Start"]
  week_end = row["Week End"]

  if week_start <= start <= week_end:
    bdays = len(pd.bdate_range(start, week_end))
  elif week_start <= end <= week_end:
    bdays = len(pd.bdate_range(week_start, end))
  elif week_start <= end <= week_end and week_start <= start <= week_end:
    bdays = len(pd.bdate_range(start, end))
  else:
    bdays = len(pd.bdate_range(week_start, week_end))
  
  return bdays

And finally the process part to return your desired output

def process_dataframe(df: pd.DataFrame) -> pd.DataFrame:
  # Making sure that these columns are datetime
  df["Date Start"] = pd.to_datetime(df["Date Start"])
  df["Date End"] = pd.to_datetime(df["Date End"])
  # Calculating Working days between date start and date end
  df["Days Off (Working)"] = df.apply(lambda row: len(pd.bdate_range(row["Date Start"], row["Date End"])), axis=1)
  df['Hours per Days Off (Working)'] = df["Time Off Hours"] / df["Days Off (Working)"]
  # Creating Week Start values 
  df["Week Start"] = df.apply(
      lambda row: pd.date_range(
          start=row["Date Start"].to_period("W").start_time, 
          end=row["Date End"].to_period("W").start_time, 
          freq="7D"
    ),
    axis=1
  )
  # Creating Week End values
  df["Week End"] = df.apply(
      lambda row: pd.date_range(
          start=row["Date Start"].to_period("W").end_time, 
          end=row["Date End"].to_period("W").end_time, 
          freq="7D"
    ),
    axis=1
  )
  # Exploding the values, since the way they were created made them as a DatetimeIndex
  # field.
  df = df.explode(["Week Start", "Week End"])
  # Just did that because the Week End had a weird time due to .end_time 
  df["Week End"] = pd.to_datetime(df["Week End"].dt.date)
  df["Week Start"] = pd.to_datetime(df["Week Start"].dt.date)
  # Using the helper function to calculate the working days
  df["Days Off (Working)"] = df.apply(get_work_days, axis=1)
  df["Total Off Hours"] = df["Days Off (Working)"] * df["Hours per Days Off (Working)"]
  return df[["Name", "Week Start", "Week End", "Days Off (Working)", "Hours per Days Off (Working)", "Total Off Hours"]]
  • Related