Home > Enterprise >  Calculating Session Duration in R
Calculating Session Duration in R

Time:06-23

I have a dataset with a session id, user id, TimeStamp in UNIX (which I converted using lubridate), and the converted TimeStamp column.

Session User ts_UNIX TimeStamp
123 345 UNIX Timestamp 14-06-2022 17:44:32
123 345 UNIX Timestamp 14-06-2022 17:44:33
123 345 UNIX Timestamp 14-06-2022 17:44:37
124 346 UNIX Timestamp 14-06-2022 15:50:10
124 346 UNIX Timestamp 14-06-2022 15:51:01
124 346 UNIX Timestamp 14-06-2022 16:30:00
125 345 UNIX Timestamp 14-06-2022 23:55:30
125 345 UNIX Timestamp 14-06-2022 23:58:50
125 345 UNIX Timestamp 14-06-2022 23:59:45
125 345 UNIX Timestamp 15-06-2022 00:00:32
125 345 UNIX Timestamp 15-06-2022 00:00:59

I would like to add another column called session_duration (in seconds) which is the difference between the max_time and min_time grouped by Session and User. For instance, for session # 123 and user 345, the session duration is [14-06-2022 17:44:37] - [14-06-2022 17:44:32] which is 5 seconds.

Session User ts_UNIX TimeStamp session_duration (seconds)
123 345 UNIX Timestamp 14-06-2022 17:44:32 5
123 345 UNIX Timestamp 14-06-2022 17:44:33 5
123 345 UNIX Timestamp 14-06-2022 17:44:37 5
124 346 UNIX Timestamp 14-06-2022 15:50:10 2390
124 346 UNIX Timestamp 14-06-2022 15:51:01 2390
124 346 UNIX Timestamp 14-06-2022 16:30:00 2390
125 345 UNIX Timestamp 14-06-2022 23:55:30 329
125 345 UNIX Timestamp 14-06-2022 23:58:50 329
125 345 UNIX Timestamp 14-06-2022 23:59:45 329
125 345 UNIX Timestamp 15-06-2022 00:00:32 329
125 345 UNIX Timestamp 15-06-2022 00:00:59 329

This is what my current code looks like. The timestamp has successfully converted, but I am facing an issue with the session duration column.

library(tidyverse)
library(lubridate)
df <- df %>%
  mutate(timestamp = as_datetime(ts_unix/1000)) %>%
  group_by (session, user, timestamp) %>%
  mutate(session_duration = difftime (max(timestamp), min(timestamp), units = "secs"))

Can someone please help me figure out the session_duration column? Thank you.

CodePudding user response:

library(data.table)
setDT(df)[, duration := max(TimeStamp) - min(TimeStamp), by = .(Session)][]

#    Session User        ts_UNIX           TimeStamp  duration
# 1:     123  345 UNIX Timestamp 2022-06-14 17:44:32    5 secs
# 2:     123  345 UNIX Timestamp 2022-06-14 17:44:33    5 secs
# 3:     123  345 UNIX Timestamp 2022-06-14 17:44:37    5 secs
# 4:     124  346 UNIX Timestamp 2022-06-14 15:50:10 2390 secs
# 5:     124  346 UNIX Timestamp 2022-06-14 15:51:01 2390 secs
# 6:     124  346 UNIX Timestamp 2022-06-14 16:30:00 2390 secs
# 7:     125  345 UNIX Timestamp 2022-06-14 23:55:30  329 secs
# 8:     125  345 UNIX Timestamp 2022-06-14 23:58:50  329 secs
# 9:     125  345 UNIX Timestamp 2022-06-14 23:59:45  329 secs
#10:     125  345 UNIX Timestamp 2022-06-15 00:00:32  329 secs
#11:     125  345 UNIX Timestamp 2022-06-15 00:00:59  329 secs

sample data

df <- fread("Session    User    ts_UNIX     TimeStamp
123     345     UNIX Timestamp  14-06-2022T17:44:32
123     345     UNIX Timestamp  14-06-2022T17:44:33
123     345     UNIX Timestamp  14-06-2022T17:44:37
124     346     UNIX Timestamp  14-06-2022T15:50:10
124     346     UNIX Timestamp  14-06-2022T15:51:01
124     346     UNIX Timestamp  14-06-2022T16:30:00
125     345     UNIX Timestamp  14-06-2022T23:55:30
125     345     UNIX Timestamp  14-06-2022T23:58:50
125     345     UNIX Timestamp  14-06-2022T23:59:45
125     345     UNIX Timestamp  15-06-2022T00:00:32
125     345     UNIX Timestamp  15-06-2022T00:00:59")

df[, TimeStamp := as.POSIXct(TimeStamp, format= "%d-%m-%YT%H:%M:%S")]

CodePudding user response:

library(tidyverse)
library(lubridate)

df %>% 
  group_by(Session, User) %>% 
  mutate(session_duration = max(TimeStamp) - min(TimeStamp))

# A tibble: 11 × 5
# Groups:   Session, User [3]
   Session  User ts_UNIX        TimeStamp           session_duration
     <dbl> <dbl> <chr>          <dttm>              <drtn>          
 1     123   345 UNIX Timestamp 2022-06-14 17:44:32    5 secs       
 2     123   345 UNIX Timestamp 2022-06-14 17:44:33    5 secs       
 3     123   345 UNIX Timestamp 2022-06-14 17:44:37    5 secs       
 4     124   346 UNIX Timestamp 2022-06-14 15:50:10 2390 secs       
 5     124   346 UNIX Timestamp 2022-06-14 15:51:01 2390 secs       
 6     124   346 UNIX Timestamp 2022-06-14 16:30:00 2390 secs       
 7     125   345 UNIX Timestamp 2022-06-14 23:55:30  329 secs       
 8     125   345 UNIX Timestamp 2022-06-14 23:58:50  329 secs       
 9     125   345 UNIX Timestamp 2022-06-14 23:59:45  329 secs       
10     125   345 UNIX Timestamp 2022-06-15 00:00:32  329 secs       
11     125   345 UNIX Timestamp 2022-06-15 00:00:59  329 secs       
  • Related