I have the following dataset,
id date1 date2 location
1 2019-06-25 19:15:00 2019-06-25 19:15:00 A
1 2019-06-25 20:35:00 2019-06-25 20:36:00 B
1 2019-06-25 22:15:00 2019-06-26 19:00:00 C
2 2019-06-26 21:15:00 2019-06-26 21:41:00 A
2 2019-06-26 23:29:00 2019-06-25 19:15:00 B
2 2019-06-26 23:30:00 2019-06-27 00:37:00 C
I am trying to create a new column that calculates time difference in minutes by doing (date2 - date1), where the date1 is always from the next row (shift(1)).
Expected output,
id date1 date2 location difference
1 2019-06-25 19:15:00 2019-06-25 19:15:00 A NAN
1 2019-06-25 20:35:00 2019-06-25 20:36:00 B 80
1 2019-06-25 22:15:00 2019-06-26 19:00:00 C 99
2 2019-06-26 21:15:00 2019-06-26 21:41:00 A NAN
2 2019-06-26 23:29:00 2019-06-26 23:29:00 B 108
2 2019-06-26 23:30:00 2019-06-27 00:37:00 C 1
I tried using group, but gives me the wrong output. But without groupby is where i am at,
df['difference'] = ((((df['date1'] -
df['date2'].shift(1)).dt.seconds)/60))
CodePudding user response:
Use:
diff_shift = lambda x: x['date1'].sub(x['date2'].shift()).dt.total_seconds().div(60)
df['difference'] = df.groupby('id').apply(diff_shift).droplevel(0)
print(df)
# Output
id date1 date2 location difference
0 1 2019-06-25 19:15:00 2019-06-25 19:15:00 A NaN
1 1 2019-06-25 20:35:00 2019-06-25 20:36:00 B 80.0
2 1 2019-06-25 22:15:00 2019-06-26 19:00:00 C 99.0
3 2 2019-06-26 21:15:00 2019-06-26 21:41:00 A NaN
4 2 2019-06-26 23:29:00 2019-06-25 19:15:00 B 108.0
5 2 2019-06-26 23:30:00 2019-06-27 00:37:00 C 1695.0
CodePudding user response:
I achieved it with this:
df['difference'] = (df['date1'] - df.groupby('id')['date2'].transform('shift')).dt.seconds / 60