calculate working hours time - Python Pandas ( hours worked total ) hours worked in morning, afterno-CodePudding

I'm pretty new to python and pandas (beginner level)

I have a pandas dataframe which I'm trying to calculate how many hours/min/secs worked in the morning , afternoon , evening.

mornings will be within the time range (03:00 and 12:00) afternoons will be within the time range ( 12:00 and 17:00)) evenings will be within the time range ( 17:00 and 03:00)

data = [ ('employee1', '2022-10-28', '12:06', '13:00:00', '00:00:00', '00:00:23'),
        ('employee2','2022-10-28', '10:00', '06:00:00', '00:00:00', '00:00:16'),
        ('employee3', '2022-05-06', '16:13', '08:00:00', '00:54:00', '00:00:09'),
        ('employee4', '2022-06-03', '2:33', '09:00:00', '00:19:00', '00:00:56'),
        ('employee5', '2022-08-12', '9:50', '20:00:00', '00:27:00', '00:00:22'),
        ('employee6', '2022-02-15', '6:52', '00:00:00', '00:35:00','00:00:35')]


df = pd.DataFrame(data, columns =['Name','date','start_time','hours_worked','minutes_worked','seconds_worked'])
df[['enddate_time','time_worked_morning','time_worked_afternoon','time_worked_evening','total_time_wkd']]=None

can someone please help onto how to calculate how much of total time worked falls within these 3 ranges ?

CodePudding user response：

Since you mentioned you are a beginner, I will include some extra comments in my code. I will also present 2 solutions: a "slow" one that use mostly Python's loops and a "fast" one that uses numpy vectorized code.

Both solutions will use the following code snippet. All dates/times in your dataframe are stored as strings, which are hard to work with. Convert them to pd.Timestamp and pd.Timedelta for easier manipulation:

start_time = pd.to_datetime(df["date"]   " "   df["start_time"])
time_worked = (
    pd.to_timedelta(df["hours_worked"])
      pd.to_timedelta(df["minutes_worked"])
      pd.to_timedelta(df["seconds_worked"])
)
end_time = start_time   time_worked

The slow solution

Loop over each employe, then over each shift to see how much time they work in that shift. Then tally up the results to assign them as new columns in your dataframe:

# You did not specify a unit for your time_worked_morning, _aternoon, etc. 
# I assume you want to measure them in hours.
unit = pd.Timedelta(1, "h")

# The shift boundaries. There are 4 boundaries for 3 shifts because we need to
# account for the start and end time of each shift.
# Shift start time: 3,  12, 17
# Shift end time:   12, 17, 27
# The numbers represent hours since midnight of the day
boundaries = [pd.Timedelta(i, "h") for i in [3, 12, 17, 27]]

# Now loop through each employee's work record to break down their working hours
# into shifts
data = []
for work_st, work_et in zip(start_time, end_time):
    # Get the midnight of the day
    start_of_day = work_st.normalize()
    # Work time in each shift starts with 0
    work_time = [0] * (len(boundaries) - 1)

    # loop through the shifts
    for i, (lb, ub) in enumerate(zip(boundaries[:-1], boundaries[1:])):
        # Calculate the start and end time of each shift
        shift_st = start_of_day   lb
        shift_et = start_of_day   ub

        # Work time in shift = (effective end time) - (effective start time)
        # Effective end time   = (work end time)   or (shift end time)  , whichever is EARLIER => use min
        # Effective start time = (work start time) or (shift start time), whichever is LATER   => use max
        # The "/ unit" operation is to convert the pd.Timedelta object into a
        # float representing the hour
        t = (min(work_et, shift_et) - max(work_st, shift_st)) / unit

        # Our algorithm above sometimes cause an `Effective end time` that is
        # before the `Effective start time`. An employee can't spend negative
        # time in a shift so clip it to 0 if negative.
        work_time[i] = max(0, t)

    data.append(work_time)

# Convert `data` numpy array for easier slicing
data = np.array(data)

# Add the extra columns to your data frame
df["enddate_time"] = end_time
df["time_worked_morning"] = data[:, 0]
df["time_worked_afternoon"] = data[:, 1]
df["time_worked_evening"] = data[:, 2]
df["total_time_wkd"] = time_worked / unit

The fast solution

Instead of working with one employee and one shift at a time, we will deal with 2D arrays containing all employees in all shifts. This enables us to use several vectorized operations offered by numpy.

# The boundaries here are pretty much the same as in the "slow" solution, only
# as a numpy array instead of list
boundaries = np.array([np.timedelta64(i, "h") for i in [3, 12, 17, 27]])
n = len(boundaries) - 1

# For every employee, repeat the work start time and end time to once per shift 
work_st = np.tile(start_time.to_numpy(), (n, 1)).T
work_et = np.tile(end_time.to_numpy(), (n, 1)).T

# Calculate the shift's start and end time for each day
start_of_day = start_time.dt.normalize().to_numpy()[:, None]
shift_st = start_of_day   boundaries[:-1]
shift_et = start_of_day   boundaries[1:]

# The effective start and end time are calculated identically to above, only
# in vectorized form
effective_st = np.max([work_st, shift_st], axis=0)
effective_et = np.min([work_et, shift_et], axis=0)

# The time worked per shift uses the same calculation as the solution: effective
# end time - effective start time, with a minimum of 0
data = np.clip((effective_et - effective_st) / unit, 0, None)

# Add the extra columns to your data frame
df["enddate_time"] = end_time
df["time_worked_morning"] = data[:, 0]
df["time_worked_afternoon"] = data[:, 1]
df["time_worked_evening"] = data[:, 2]
df["total_time_wkd"] = time_worked / unit