Use mask to count hourly slots from multiple datetime columns-CodePudding

I am trying to calculate the load(or strain?) in a waiting room. I have the timedate's for arrival, start of treatment and finish(randomly generated below).

I would like to calculate the load in some defined interval (f.ex. per hour).

In pseudo code

for each hour in total time:
    load = (arrival[hour]   (arrival[<hour] && finish[=>hour])).count()

load is more or less: how many arrived during the slot how many where still there from previous slots.

The timings are for many months and I would like to get an overview, by aggregating data into one week, hourly slots; I think it's a MultiIndex of {day, hour} as for df2 below. This is my MWE.

from datetime import timedelta
import numpy as np
import pandas as pd

# generate random arrival/waiting and finish dates.
startday= pd.to_datetime("1/1/2021")
endday = pd.to_datetime("1/14/2021")

# 10**9 is to convert between ns and s
arrival= pd.DatetimeIndex(
        (10**9*np.random.randint(startday.value//10**9, endday.value//10**9, 2000))
    ).sort_values()
treatment = pd.DatetimeIndex(
    [s   timedelta(minutes=np.random.randint(5, 55)) for s in arrival]
)
finish = pd.DatetimeIndex(
    [s   timedelta(minutes=np.random.randint(15, 60)) for s in treatment]
)

# each row shows when a person arrived, how long she waited and when she left
waiting = treatment - arrival
df = pd.DataFrame(
    data={
        "arrival": arrival,
        "treatment": treatment,
        "finish": finish,
        "waiting": waiting,
    },
    index=arrival,
)

# group into hourly bins over one week
df2 = pd.DataFrame(
    data={
        "arrival": df["arrival"]
        .groupby([(idx := df["arrival"].dt).weekday, idx.hour])
        .count(),
        "treatment": df["treatment"]
        .groupby([(idx := df["treatment"].dt).weekday, idx.hour])
        .count(),
        "finish": df["finish"]
        .groupby([(idx := df["finish"].dt).weekday, idx.hour])
        .count(),
    },
).fillna(0)
# fillna fails if waiting is added
df2["waiting"] = (
    df["waiting"].groupby([(idx := df["arrival"].dt).weekday, idx.hour]).mean()
)

Help calculating load would be much appreciated. I spent many hours in vain.

Simple example

arrival = [10.00, 10.15, 10,45, 11.30, 11.45, 12.15]
finish =  [10.30, 10.59, 11.45, 12.30, 12.45, 13.00]

Then load = arrival 'people still not finished'

load = {"10": 3 0, "11": 2 1, "12":1 2}

That have to work for all days.

Best regards,

CodePudding user response：

Try this:

df['arrival_date_hour'] = df.arrival.apply(lambda x: x.strftime('%y-%m-%d %H'))
df['finish_date_hour'] = df.finish.apply(lambda x: x.strftime('%y-%m-%d %H'))

result_df = pd.DataFrame({'arrival_date_hour':df.arrival_date_hour.unique(), 'load':None}).set_index('arrival_date_hour')

for datehour in df.arrival_date_hour.unique():
    load = df[(df.finish_date_hour >= datehour) & (df.arrival_date_hour <= datehour)].shape[0]
    result_df.loc[datehour] = load

Result would look something like:

arrival_date_hour   load
21-01-01 00         10
21-01-01 01         18
21-01-01 02         19
21-01-01 03         19
21-01-01 04         21
...                 ...
21-01-13 19         19
21-01-13 20         20
21-01-13 21         17
21-01-13 22         21
21-01-13 23         20

UPDATE

Try this to get a mean of your load based on weekday and hour:

result_df.reset_index(inplace=True)
result_df['arrival_date_hour'] = result_df.arrival_date_hour.apply(
    datetime.strptime,
    args=["%y-%m-%d %H"])
result_df['hour'] = result_df.arrival_date_hour.apply(lambda x: x.hour)
result_df['weekday'] = result_df.arrival_date_hour.apply(datetime.weekday)
result_df.groupby(['weekday', 'hour']).load.mean()