I am trying to calculate the load(or strain?) in a waiting room.
I have the timedate's for arrival
, start of treatment
and finish
(randomly generated below).
I would like to calculate the load in some defined interval (f.ex. per hour).
In pseudo code
for each hour in total time:
load = (arrival[hour] (arrival[<hour] && finish[=>hour])).count()
load
is more or less: how many arrived during the slot how many where still there from previous slots.
The timings are for many months and I would like to get an overview, by aggregating data into one week, hourly slots; I think it's a MultiIndex
of {day, hour}
as for df2
below. This is my MWE.
from datetime import timedelta
import numpy as np
import pandas as pd
# generate random arrival/waiting and finish dates.
startday= pd.to_datetime("1/1/2021")
endday = pd.to_datetime("1/14/2021")
# 10**9 is to convert between ns and s
arrival= pd.DatetimeIndex(
(10**9*np.random.randint(startday.value//10**9, endday.value//10**9, 2000))
).sort_values()
treatment = pd.DatetimeIndex(
[s timedelta(minutes=np.random.randint(5, 55)) for s in arrival]
)
finish = pd.DatetimeIndex(
[s timedelta(minutes=np.random.randint(15, 60)) for s in treatment]
)
# each row shows when a person arrived, how long she waited and when she left
waiting = treatment - arrival
df = pd.DataFrame(
data={
"arrival": arrival,
"treatment": treatment,
"finish": finish,
"waiting": waiting,
},
index=arrival,
)
# group into hourly bins over one week
df2 = pd.DataFrame(
data={
"arrival": df["arrival"]
.groupby([(idx := df["arrival"].dt).weekday, idx.hour])
.count(),
"treatment": df["treatment"]
.groupby([(idx := df["treatment"].dt).weekday, idx.hour])
.count(),
"finish": df["finish"]
.groupby([(idx := df["finish"].dt).weekday, idx.hour])
.count(),
},
).fillna(0)
# fillna fails if waiting is added
df2["waiting"] = (
df["waiting"].groupby([(idx := df["arrival"].dt).weekday, idx.hour]).mean()
)
Help calculating load
would be much appreciated. I spent many hours in vain.
Simple example
arrival = [10.00, 10.15, 10,45, 11.30, 11.45, 12.15]
finish = [10.30, 10.59, 11.45, 12.30, 12.45, 13.00]
Then load = arrival 'people still not finished'
load = {"10": 3 0, "11": 2 1, "12":1 2}
That have to work for all days.
Best regards,
CodePudding user response:
Try this:
df['arrival_date_hour'] = df.arrival.apply(lambda x: x.strftime('%y-%m-%d %H'))
df['finish_date_hour'] = df.finish.apply(lambda x: x.strftime('%y-%m-%d %H'))
result_df = pd.DataFrame({'arrival_date_hour':df.arrival_date_hour.unique(), 'load':None}).set_index('arrival_date_hour')
for datehour in df.arrival_date_hour.unique():
load = df[(df.finish_date_hour >= datehour) & (df.arrival_date_hour <= datehour)].shape[0]
result_df.loc[datehour] = load
Result would look something like:
arrival_date_hour load
21-01-01 00 10
21-01-01 01 18
21-01-01 02 19
21-01-01 03 19
21-01-01 04 21
... ...
21-01-13 19 19
21-01-13 20 20
21-01-13 21 17
21-01-13 22 21
21-01-13 23 20
UPDATE
Try this to get a mean of your load based on weekday and hour:
result_df.reset_index(inplace=True)
result_df['arrival_date_hour'] = result_df.arrival_date_hour.apply(
datetime.strptime,
args=["%y-%m-%d %H"])
result_df['hour'] = result_df.arrival_date_hour.apply(lambda x: x.hour)
result_df['weekday'] = result_df.arrival_date_hour.apply(datetime.weekday)
result_df.groupby(['weekday', 'hour']).load.mean()