I have employee id, their clock in, and clock out timings by day. I want to calculate number of employee present in office by hour by Date.
Example Data
import pandas as pd
data1 = {'emp_id': ['Employee 1', 'Employee 2', 'Employee 3', 'Employee 4', 'Employee 5'],
'Clockin': ['12/5/2021 0:08','8/7/2021 0:04','3/30/2021 1:24','12/23/2021 22:45', '12/23/2021 23:29'],
'Clockout': ['12/5/2021 3:28','8/7/2021 0:34','3/30/2021 4:37','12/24/2021 0:42', '12/24/2021 1:42']}
df1 = pd.DataFrame(data1)
Example of output
import pandas as pd
data2 = {'Date': ['12/5/2021', '8/7/2021', '3/30/2021','3/30/2021','3/30/2021','3/30/2021', '12/23/2021','12/23/2021','12/24/2021','12/24/2021'],
'Hour': ['01:00','01:00','02:00','03:00','04:00','05:00', '22:00','23:00', '01:00','02:00'],
'emp_count': [1,1,1,1,1,1,1,2, 2,1]}
df2 = pd.DataFrame(data2)
CodePudding user response:
Try this:
# Round clock in DOWN to the nearest PRECEDING hour
clock_in = pd.to_datetime(df1["Clockin"]).dt.floor("H")
# Round clock out UP to the nearest SUCCEEDING hour
clock_out = pd.to_datetime(df1["Clockout"]).dt.ceil("H")
# Generate time series at hourly frequency between adjusted clock in and clock
# out time
hours = pd.Series(
[
pd.date_range(in_, out_, freq="H", inclusive="right")
for in_, out_ in zip(clock_in, clock_out)
]
).explode()
# Final result
hours.groupby(hours).count()
Result:
2021-03-30 02:00:00 1
2021-03-30 03:00:00 1
2021-03-30 04:00:00 1
2021-03-30 05:00:00 1
2021-08-07 01:00:00 1
2021-12-05 01:00:00 1
2021-12-05 02:00:00 1
2021-12-05 03:00:00 1
2021-12-05 04:00:00 1
2021-12-23 23:00:00 1
2021-12-24 00:00:00 2
2021-12-24 01:00:00 2
2021-12-24 02:00:00 1
dtype: int64
It's slightly different from your expected output but consistent with your business rules.