Home > Software design >  Date-grouping by shift type in pandas
Date-grouping by shift type in pandas

Time:12-07

I have some mock data of employees working for a company in different shift. These shifts are organized between working/non-working each 8 days, such that they may be 5-3 (5 days working, 3 not working), 6-2, 4-4, etc. A snippet of the table may be this:

Date Employee Hours worked
2022-01-01 Alice 8
2022-01-02 Alice 8
2022-01-03 Alice 8
2022-01-03 Bob 8
2022-01-04 Alice 8
2022-01-04 Bob 8
2022-01-05 Alice 8
2022-01-05 Bob 8
2022-01-06 Bob 8
2022-01-07 Bob 8
2022-01-08 Bob 8
2022-01-09 Alice 8
2022-01-10 Alice 4
2022-01-11 Alice 8
2022-01-11 Bob 8

So Alice works from January 1st to January 5th and then does not work from January 6th to January 8th (so it's a 5-3 shift), while Bob works from January 3rd to January 8th, and then does not work from January 9th to 10th (so it's a 6-2 shift).

I would need to compute with pandas in which type of shift a worker is in each day, and the specific day of the shift (e.g., for Alice January 3rd is her 3rd day of the shift, but for Bob is his 1st). The shift type may vary everytime one shift is complete, meaning that after her 5-3 shift Alice may start a 6-2 shift and viceversa for Bob.

I tried going to a single employee first to simplify the problem, putting the date as index and filling the gaps:

df.index = pd.DatetimeIndex(df['Date'])

df = df.reindex(pd.date_range("2022/01/01","2022/03/31"))

Then I create a new column, "working", which takes value 1 if the employee is working a single day and 0 if they are not:

df['working'] = 1

df['working'][df['Hours worked'].isnull()] = 0

Now my idea was to somehow use a 1-day rolling sum, which resets everytime a full shift (working non-working days) is completed, but I am unable to do so, and I do not know how to generalize to all possible employees without a loop. The final desired output would be something like this:

Date Employee Hours worked Shift type Shift day
2022-01-01 Alice 8 5-3 1
2022-01-02 Alice 8 5-3 2
2022-01-03 Alice 8 5-3 3
2022-01-03 Bob 8 6-2 1
2022-01-04 Alice 8 5-3 4
2022-01-04 Bob 8 6-2 2
2022-01-05 Alice 8 5-3 5
2022-01-05 Bob 8 6-2 3
2022-01-06 Bob 8 6-2 4
2022-01-07 Bob 8 6-2 5
2022-01-08 Bob 8 6-2 6
2022-01-09 Alice 8 6-2 1
2022-01-10 Alice 4 6-2 2
2022-01-11 Alice 8 6-2 3
2022-01-11 Bob 8 5-3 1

CodePudding user response:

You can use the consecutive days per Employee to identify the number of days worked in a row and map the shift type, then groupby.cumcount the day number:

# number of worked days -> shift type
shifts = {3: '3-4', 4: '4-4', 5: '5-3', 6: '6-2'}

# ensure datetime
df['Date'] = pd.to_datetime(df['Date'])

# identify the groups of consecutive days worked by employee
shift = df.groupby('Employee')['Date'].apply(lambda s: s.diff().ne('1d').cumsum())

g = df.groupby(['Employee', shift])['Date']
df['Shift type'] = g.transform('count').map(shifts)
df['Shift day'] = g.cumcount().add(1)

Output:

         Date Employee  Hours worked Shift type  Shift day
0  2022-01-01    Alice             8        5-3          1
1  2022-01-02    Alice             8        5-3          2
2  2022-01-03    Alice             8        5-3          3
3  2022-01-03      Bob             8        6-2          1
4  2022-01-04    Alice             8        5-3          4
5  2022-01-04      Bob             8        6-2          2
6  2022-01-05    Alice             8        5-3          5
7  2022-01-05      Bob             8        6-2          3
8  2022-01-06      Bob             8        6-2          4
9  2022-01-07      Bob             8        6-2          5
10 2022-01-08      Bob             8        6-2          6
11 2022-01-09    Alice             8        3-4          1
12 2022-01-10    Alice             4        3-4          2
13 2022-01-11    Alice             8        3-4          3
14 2022-01-11      Bob             8        NaN          1
  • Related