Context
Datetime | Campaign_name | Status | Open_time |
---|---|---|---|
2022-03-15 00:00 | Funny_campaign | Open | |
2022-03-15 01:00 | Funny_campaign | Continue | |
2022-03-15 02:00 | Funny_campaign | Continue | |
2022-03-15 03:00 | Funny_campaign | Continue | |
2022-03-15 04:00 | Funny_campaign | Close | |
2022-03-15 08:00 | Funny_campaign | Open | |
2022-03-15 09:00 | Funny_campaign | Continue | |
2022-03-15 10:00 | Funny_campaign | Close |
Problem
I need to calculate the time from open to close.
My code right now
There are two approches I could go with. Get the open time in each 'Close' or a cumulative open_time in each 'Open' and 'Continue'. Here is my take on the last one.
My code right now is almost fine, it doesn't count the time between Close and Open but it forgets to sum the last time difference.
df["Datetime"] = pd.to_datetime(df["Datetime"])
df["time_diff"] = df["Datetime"].diff()
df["time_diff"] = df["time_diff"].astype("timedelta64[m]").fillna(0)
condition = df["Status"] == "Close"
df.loc[condition, "time_diff"] = 0
df["Cumulative time"] = df.groupby(["Campaign_name"])["time_diff"].cumsum()
df = df.drop("time_diff", 1)
CodePudding user response:
IIUC, you could start new groups on the opens and use:
df['Datetime'] = pd.to_datetime(df['Datetime'])
group = df['Status'].eq('Open').cumsum()
df['Open_time'] = df.groupby(group)['Datetime'].apply(lambda g: g-g.iloc[0])
# or, alternative syntax
# df['Open_time'] = df.groupby(group)['Datetime'].apply(lambda g: g.diff().cumsum())
Output:
Datetime Campaign_name Status Open_time
0 2022-03-15 00:00:00 Funny_campaign Open 0 days 00:00:00
1 2022-03-15 01:00:00 Funny_campaign Continue 0 days 01:00:00
2 2022-03-15 02:00:00 Funny_campaign Continue 0 days 02:00:00
3 2022-03-15 03:00:00 Funny_campaign Continue 0 days 03:00:00
4 2022-03-15 04:00:00 Funny_campaign Close 0 days 04:00:00
5 2022-03-15 08:00:00 Funny_campaign Open 0 days 00:00:00
6 2022-03-15 09:00:00 Funny_campaign Continue 0 days 01:00:00
7 2022-03-15 10:00:00 Funny_campaign Close 0 days 02:00:00
Or to only assign to "Close":
df.loc[df['Status'].eq('Close'), 'Open_time'] = df.groupby(group)['Datetime'].apply(lambda g: g-g.iloc[0])
Output:
Datetime Campaign_name Status Open_time
0 2022-03-15 00:00:00 Funny_campaign Open NaN
1 2022-03-15 01:00:00 Funny_campaign Continue NaN
2 2022-03-15 02:00:00 Funny_campaign Continue NaN
3 2022-03-15 03:00:00 Funny_campaign Continue NaN
4 2022-03-15 04:00:00 Funny_campaign Close 0 days 04:00:00
5 2022-03-15 08:00:00 Funny_campaign Open NaN
6 2022-03-15 09:00:00 Funny_campaign Continue NaN
7 2022-03-15 10:00:00 Funny_campaign Close 0 days 02:00:00
And for just the difference close-open for each group:
df.groupby(group)['Datetime'].agg(lambda g: g.iloc[-1]-g.iloc[0])
Output:
Status
1 0 days 04:00:00
2 0 days 02:00:00
Name: Datetime, dtype: timedelta64[ns]