Home > Net >  Pandas groupby find difference between two date time columns
Pandas groupby find difference between two date time columns

Time:03-29

I have the following dataset,

id      date1                date2              location
1   2019-06-25 19:15:00  2019-06-25 19:15:00       A
1   2019-06-25 20:35:00  2019-06-25 20:36:00       B
1   2019-06-25 22:15:00  2019-06-26 19:00:00       C
2   2019-06-26 21:15:00  2019-06-26 21:41:00       A
2   2019-06-26 23:29:00  2019-06-25 19:15:00       B
2   2019-06-26 23:30:00  2019-06-27 00:37:00       C

I am trying to create a new column that calculates time difference in minutes by doing (date2 - date1), where the date1 is always from the next row (shift(1)).

Expected output,

id      date1                date2              location    difference
1   2019-06-25 19:15:00  2019-06-25 19:15:00       A           NAN
1   2019-06-25 20:35:00  2019-06-25 20:36:00       B           80
1   2019-06-25 22:15:00  2019-06-26 19:00:00       C           99
2   2019-06-26 21:15:00  2019-06-26 21:41:00       A           NAN
2   2019-06-26 23:29:00  2019-06-26 23:29:00       B           108
2   2019-06-26 23:30:00  2019-06-27 00:37:00       C           1

I tried using group, but gives me the wrong output. But without groupby is where i am at,

df['difference'] = ((((df['date1'] - 
                    df['date2'].shift(1)).dt.seconds)/60))

CodePudding user response:

Use:

diff_shift = lambda x: x['date1'].sub(x['date2'].shift()).dt.total_seconds().div(60)
df['difference'] = df.groupby('id').apply(diff_shift).droplevel(0)
print(df)

# Output
   id               date1               date2 location  difference
0   1 2019-06-25 19:15:00 2019-06-25 19:15:00        A         NaN
1   1 2019-06-25 20:35:00 2019-06-25 20:36:00        B        80.0
2   1 2019-06-25 22:15:00 2019-06-26 19:00:00        C        99.0
3   2 2019-06-26 21:15:00 2019-06-26 21:41:00        A         NaN
4   2 2019-06-26 23:29:00 2019-06-25 19:15:00        B       108.0
5   2 2019-06-26 23:30:00 2019-06-27 00:37:00        C      1695.0

CodePudding user response:

I achieved it with this:

df['difference'] = (df['date1'] - df.groupby('id')['date2'].transform('shift')).dt.seconds / 60
  • Related