I have a raw table as show below:
ColumnA(Index) | StarTime | EndTime |
---|---|---|
A | 2022-03-16 13:07:28 | 2022-03-16 13:26:10 |
A | 2022-03-16 13:38:28 | 2022-03-16 13:40:28 |
B | 2022-03-16 14:01:28 | 2022-03-16 14:10:28 |
C | 2022-03-16 14:19:28 | 2022-03-16 14:29:28 |
C | 2022-03-16 18:10:28 | 2022-03-16 18:18:28 |
C | 2022-03-16 18:28:28 | 2022-03-16 18:50:28 |
Question
I am seeking help to create a new data frame where there is a new column for each Index item called Interval (in minutes) which is equal to StartTime of the next row of index item(if it exists) minus EndTime of the previous row Item. For the first row of each index, the Interval should be 0.
Expected Output Table:
ColumnA(Index) | StarTime | EndTime | Interval(mins) |
---|---|---|---|
A | 2022-03-16 13:07:28 | 2022-03-16 13:26:10 | 0 |
A | 2022-03-16 13:38:28 | 2022-03-16 13:40:28 | 12 |
B | 2022-03-16 14:01:28 | 2022-03-16 14:10:28 | 0 |
C | 2022-03-16 14:19:28 | 2022-03-16 14:29:28 | 0 |
C | 2022-03-16 18:10:28 | 2022-03-16 18:18:28 | 10 |
C | 2022-03-16 18:28:28 | 2022-03-16 18:50:28 | 10 |
Any help is appreciated.
CodePudding user response:
Try:
- convert date columns to datetime if needed
- sort by index and start time
- calculate the differences of current row "StarTime" with previous row "EndTime" and assign where the index is the same.
df["StarTime"] = pd.to_datetime(df["StarTime"])
df["EndTime"] = pd.to_datetime(df["EndTime"])
df = df.sort_values(["ColumnA(Index)","StarTime"])
df["Interval(mins)"] = df["StarTime"].sub(df["EndTime"].shift()).dt.total_seconds().div(60).where(df["ColumnA(Index)"].eq(df["ColumnA(Index)"].shift())).fillna(0)
>>> df
ColumnA(Index) StarTime EndTime Interval(mins)
0 A 2022-03-16 13:07:28 2022-03-16 13:26:10 0.0
1 A 2022-03-16 13:38:28 2022-03-16 13:40:28 12.3
2 B 2022-03-16 14:01:28 2022-03-16 14:10:28 0.0
3 C 2022-03-16 14:19:28 2022-03-16 14:29:28 0.0
4 C 2022-03-16 18:10:28 2022-03-16 18:18:28 221.0
5 C 2022-03-16 18:28:28 2022-03-16 18:50:28 10.0