I have a table like so:
ID Timestamp Status
A 5/30/2022 2:29 Run Ended
A 5/30/2022 0:23 In Progress
A 5/30/2022 0:22 Prepared
B 5/30/2022 11:15 Run Ended
B 5/30/2022 9:18 In Progress
B 5/30/2022 0:55 Prepared
I want to compute the duration between each status grouped by the ID. So the resulting output table would be:
ID Duration(min) Status change
A 0.40 In Progress-Prepared
A 125.82 Run Ended - In Progress
B 502.78 In Progress-Prepared
B 117.34 Run Ended - In Progress
How do I order it by descending timestamp (grouped by ID) and then subtract the last row from the previous row all the way upto the top for each ID group?
CodePudding user response:
You can use a groupby.diff
and groupby.shift
:
out = (df
.assign(**{'Duration(min)': pd.to_datetime(df['Timestamp'], dayfirst=False)
.groupby(df['ID'])
.diff(-1).dt.total_seconds() # diff in seconds to next time in group
.div(60), # convert to minutes
'Status change': df.groupby('ID')['Status'].shift(-1) '-' df['Status']
})
.dropna(subset='Duration(min)') # get rid of empty rows
[['ID', 'Duration(min)', 'Status change']]
)
Output:
ID Duration(min) Status change
0 A 126.0 In Progress-Run Ended
1 A 1.0 Prepared-In Progress
3 B 117.0 In Progress-Run Ended
4 B 503.0 Prepared-In Progress
CodePudding user response:
You can use groupby('ID')[value].shift(1)
to access the previous value
in the same ID
group.
import pandas as pd
df = pd.DataFrame({
'ID': ['a','a','a','b','b','b'],
'time': [1,2,3,1,4,5],
'status': ['x','y','z','xx','yy','zz']
})
df['previous_time'] = df.groupby('ID')['time'].shift(1)
df['previous_status'] = df.groupby('ID')['status'].shift(1)
df = df.dropna()
df['duration'] = df['time'] - df['previous_time'] # change this line to calculate duration between time instead
df['status_change'] = df['previous_status'] '-' df['status']
print (df[['ID','duration','status_change']].to_markdown(index=False))
Output:
ID | duration | status_change |
---|---|---|
a | 1 | x-y |
a | 1 | y-z |
b | 3 | xx-yy |
b | 1 | yy-zz |
PS. you can subtract time
and previous_time
with answer in this thread