I have a pandas DataFrame with database logs for updates made to a database. I'd like to find a surrogate for session length by using the datetime of a login entry to the time of the last entry before the next login.
log_id staff_id log_date action_name
559932 1 2019-01-05 06:32:36.960 Login
559933 1 2019-01-05 06:33:14.723 Update
559935 1 2019-01-05 06:34:49.770 Insert
559937 1 2019-01-05 06:40:36.503 Update
559938 1 2019-01-05 06:41:11.443 Update
559957 1 2019-01-05 07:24:12.190 Login
559958 1 2019-01-05 07:24:25.673 Update
559960 1 2019-01-05 07:25:01.673 Update
559963 1 2019-01-05 07:25:49.880 Update
559964 1 2019-01-05 07:25:49.897 Accepted
559966 1 2019-01-05 07:27:42.860 Insert
559967 1 2019-01-05 07:27:42.860 Insert
559968 1 2019-01-05 07:27:42.903 Admitted
559969 1 2019-01-05 07:30:23.173 Insert
559970 1 2019-01-05 07:30:37.643 Insert
559971 1 2019-01-05 07:31:18.640 Update
559972 1 2019-01-05 07:31:31.033 Update
570170 2 2019-01-25 22:52:10.160 Login
Ideally would be able to groupby
"staff_id" and subtract the datetime of a login entry from the datetime entry of a shifted -1 from the next login. But I can't seem to get what I'm looking for with the groupby
diff
function. Any thoughts would be appreciated.
CodePudding user response:
IIUC, you want the time difference between each login entries; so a cumsum
on "action_name" column to derive session IDs and groupby
it and find the difference between last and first time in each group should do the job:
df['log_date'] = pd.to_datetime(df['log_date'])
df['session_length'] = (df.groupby(['staff_id', df['action_name'].eq('Login').cumsum()])['log_date']
.transform(lambda x: x.iat[-1] - x.iat[0])
.astype(str).str.extract('days (.*)'))
Output:
log_id staff_id log_date action_name session_length
0 559932 1 2019-01-05 06:32:36.960 Login 00:08:34.483000
1 559933 1 2019-01-05 06:33:14.723 Update 00:08:34.483000
2 559935 1 2019-01-05 06:34:49.770 Insert 00:08:34.483000
3 559937 1 2019-01-05 06:40:36.503 Update 00:08:34.483000
4 559938 1 2019-01-05 06:41:11.443 Update 00:08:34.483000
5 559957 1 2019-01-05 07:24:12.190 Login 00:07:18.843000
6 559958 1 2019-01-05 07:24:25.673 Update 00:07:18.843000
7 559960 1 2019-01-05 07:25:01.673 Update 00:07:18.843000
8 559963 1 2019-01-05 07:25:49.880 Update 00:07:18.843000
9 559964 1 2019-01-05 07:25:49.897 Accepted 00:07:18.843000
10 559966 1 2019-01-05 07:27:42.860 Insert 00:07:18.843000
11 559967 1 2019-01-05 07:27:42.860 Insert 00:07:18.843000
12 559968 1 2019-01-05 07:27:42.903 Admitted 00:07:18.843000
13 559969 1 2019-01-05 07:30:23.173 Insert 00:07:18.843000
14 559970 1 2019-01-05 07:30:37.643 Insert 00:07:18.843000
15 559971 1 2019-01-05 07:31:18.640 Update 00:07:18.843000
16 559972 1 2019-01-05 07:31:31.033 Update 00:07:18.843000
17 570170 2 2019-01-25 22:52:10.160 Login 00:00:00