Home > Net >  Calculate difference between datetime based on column in pandas
Calculate difference between datetime based on column in pandas

Time:03-08

I have a pandas DataFrame with database logs for updates made to a database. I'd like to find a surrogate for session length by using the datetime of a login entry to the time of the last entry before the next login.

log_id  staff_id    log_date    action_name
559932  1   2019-01-05 06:32:36.960 Login
559933  1   2019-01-05 06:33:14.723 Update
559935  1   2019-01-05 06:34:49.770 Insert
559937  1   2019-01-05 06:40:36.503 Update
559938  1   2019-01-05 06:41:11.443 Update
559957  1   2019-01-05 07:24:12.190 Login
559958  1   2019-01-05 07:24:25.673 Update
559960  1   2019-01-05 07:25:01.673 Update
559963  1   2019-01-05 07:25:49.880 Update
559964  1   2019-01-05 07:25:49.897 Accepted
559966  1   2019-01-05 07:27:42.860 Insert
559967  1   2019-01-05 07:27:42.860 Insert
559968  1   2019-01-05 07:27:42.903 Admitted
559969  1   2019-01-05 07:30:23.173 Insert
559970  1   2019-01-05 07:30:37.643 Insert
559971  1   2019-01-05 07:31:18.640 Update
559972  1   2019-01-05 07:31:31.033 Update
570170  2   2019-01-25 22:52:10.160 Login

Ideally would be able to groupby "staff_id" and subtract the datetime of a login entry from the datetime entry of a shifted -1 from the next login. But I can't seem to get what I'm looking for with the groupby diff function. Any thoughts would be appreciated.

CodePudding user response:

IIUC, you want the time difference between each login entries; so a cumsum on "action_name" column to derive session IDs and groupby it and find the difference between last and first time in each group should do the job:

df['log_date'] = pd.to_datetime(df['log_date'])
df['session_length'] = (df.groupby(['staff_id', df['action_name'].eq('Login').cumsum()])['log_date']
                         .transform(lambda x: x.iat[-1] - x.iat[0])
                         .astype(str).str.extract('days (.*)'))

Output:

    log_id  staff_id                log_date action_name   session_length
0   559932         1 2019-01-05 06:32:36.960       Login  00:08:34.483000
1   559933         1 2019-01-05 06:33:14.723      Update  00:08:34.483000
2   559935         1 2019-01-05 06:34:49.770      Insert  00:08:34.483000
3   559937         1 2019-01-05 06:40:36.503      Update  00:08:34.483000
4   559938         1 2019-01-05 06:41:11.443      Update  00:08:34.483000
5   559957         1 2019-01-05 07:24:12.190       Login  00:07:18.843000
6   559958         1 2019-01-05 07:24:25.673      Update  00:07:18.843000
7   559960         1 2019-01-05 07:25:01.673      Update  00:07:18.843000
8   559963         1 2019-01-05 07:25:49.880      Update  00:07:18.843000
9   559964         1 2019-01-05 07:25:49.897    Accepted  00:07:18.843000
10  559966         1 2019-01-05 07:27:42.860      Insert  00:07:18.843000
11  559967         1 2019-01-05 07:27:42.860      Insert  00:07:18.843000
12  559968         1 2019-01-05 07:27:42.903    Admitted  00:07:18.843000
13  559969         1 2019-01-05 07:30:23.173      Insert  00:07:18.843000
14  559970         1 2019-01-05 07:30:37.643      Insert  00:07:18.843000
15  559971         1 2019-01-05 07:31:18.640      Update  00:07:18.843000
16  559972         1 2019-01-05 07:31:31.033      Update  00:07:18.843000
17  570170         2 2019-01-25 22:52:10.160       Login         00:00:00
  • Related