I have a dataset with a session id, user id, TimeStamp in UNIX (which I converted using lubridate), and the converted TimeStamp column.
Session | User | ts_UNIX | TimeStamp |
---|---|---|---|
123 | 345 | UNIX Timestamp | 14-06-2022 17:44:32 |
123 | 345 | UNIX Timestamp | 14-06-2022 17:44:33 |
123 | 345 | UNIX Timestamp | 14-06-2022 17:44:37 |
124 | 346 | UNIX Timestamp | 14-06-2022 15:50:10 |
124 | 346 | UNIX Timestamp | 14-06-2022 15:51:01 |
124 | 346 | UNIX Timestamp | 14-06-2022 16:30:00 |
125 | 345 | UNIX Timestamp | 14-06-2022 23:55:30 |
125 | 345 | UNIX Timestamp | 14-06-2022 23:58:50 |
125 | 345 | UNIX Timestamp | 14-06-2022 23:59:45 |
125 | 345 | UNIX Timestamp | 15-06-2022 00:00:32 |
125 | 345 | UNIX Timestamp | 15-06-2022 00:00:59 |
I would like to add another column called session_duration (in seconds) which is the difference between the max_time and min_time grouped by Session and User. For instance, for session # 123 and user 345, the session duration is [14-06-2022 17:44:37] - [14-06-2022 17:44:32] which is 5 seconds.
Session | User | ts_UNIX | TimeStamp | session_duration (seconds) |
---|---|---|---|---|
123 | 345 | UNIX Timestamp | 14-06-2022 17:44:32 | 5 |
123 | 345 | UNIX Timestamp | 14-06-2022 17:44:33 | 5 |
123 | 345 | UNIX Timestamp | 14-06-2022 17:44:37 | 5 |
124 | 346 | UNIX Timestamp | 14-06-2022 15:50:10 | 2390 |
124 | 346 | UNIX Timestamp | 14-06-2022 15:51:01 | 2390 |
124 | 346 | UNIX Timestamp | 14-06-2022 16:30:00 | 2390 |
125 | 345 | UNIX Timestamp | 14-06-2022 23:55:30 | 329 |
125 | 345 | UNIX Timestamp | 14-06-2022 23:58:50 | 329 |
125 | 345 | UNIX Timestamp | 14-06-2022 23:59:45 | 329 |
125 | 345 | UNIX Timestamp | 15-06-2022 00:00:32 | 329 |
125 | 345 | UNIX Timestamp | 15-06-2022 00:00:59 | 329 |
This is what my current code looks like. The timestamp has successfully converted, but I am facing an issue with the session duration column.
library(tidyverse)
library(lubridate)
df <- df %>%
mutate(timestamp = as_datetime(ts_unix/1000)) %>%
group_by (session, user, timestamp) %>%
mutate(session_duration = difftime (max(timestamp), min(timestamp), units = "secs"))
Can someone please help me figure out the session_duration column? Thank you.
CodePudding user response:
library(data.table)
setDT(df)[, duration := max(TimeStamp) - min(TimeStamp), by = .(Session)][]
# Session User ts_UNIX TimeStamp duration
# 1: 123 345 UNIX Timestamp 2022-06-14 17:44:32 5 secs
# 2: 123 345 UNIX Timestamp 2022-06-14 17:44:33 5 secs
# 3: 123 345 UNIX Timestamp 2022-06-14 17:44:37 5 secs
# 4: 124 346 UNIX Timestamp 2022-06-14 15:50:10 2390 secs
# 5: 124 346 UNIX Timestamp 2022-06-14 15:51:01 2390 secs
# 6: 124 346 UNIX Timestamp 2022-06-14 16:30:00 2390 secs
# 7: 125 345 UNIX Timestamp 2022-06-14 23:55:30 329 secs
# 8: 125 345 UNIX Timestamp 2022-06-14 23:58:50 329 secs
# 9: 125 345 UNIX Timestamp 2022-06-14 23:59:45 329 secs
#10: 125 345 UNIX Timestamp 2022-06-15 00:00:32 329 secs
#11: 125 345 UNIX Timestamp 2022-06-15 00:00:59 329 secs
sample data
df <- fread("Session User ts_UNIX TimeStamp
123 345 UNIX Timestamp 14-06-2022T17:44:32
123 345 UNIX Timestamp 14-06-2022T17:44:33
123 345 UNIX Timestamp 14-06-2022T17:44:37
124 346 UNIX Timestamp 14-06-2022T15:50:10
124 346 UNIX Timestamp 14-06-2022T15:51:01
124 346 UNIX Timestamp 14-06-2022T16:30:00
125 345 UNIX Timestamp 14-06-2022T23:55:30
125 345 UNIX Timestamp 14-06-2022T23:58:50
125 345 UNIX Timestamp 14-06-2022T23:59:45
125 345 UNIX Timestamp 15-06-2022T00:00:32
125 345 UNIX Timestamp 15-06-2022T00:00:59")
df[, TimeStamp := as.POSIXct(TimeStamp, format= "%d-%m-%YT%H:%M:%S")]
CodePudding user response:
library(tidyverse)
library(lubridate)
df %>%
group_by(Session, User) %>%
mutate(session_duration = max(TimeStamp) - min(TimeStamp))
# A tibble: 11 × 5
# Groups: Session, User [3]
Session User ts_UNIX TimeStamp session_duration
<dbl> <dbl> <chr> <dttm> <drtn>
1 123 345 UNIX Timestamp 2022-06-14 17:44:32 5 secs
2 123 345 UNIX Timestamp 2022-06-14 17:44:33 5 secs
3 123 345 UNIX Timestamp 2022-06-14 17:44:37 5 secs
4 124 346 UNIX Timestamp 2022-06-14 15:50:10 2390 secs
5 124 346 UNIX Timestamp 2022-06-14 15:51:01 2390 secs
6 124 346 UNIX Timestamp 2022-06-14 16:30:00 2390 secs
7 125 345 UNIX Timestamp 2022-06-14 23:55:30 329 secs
8 125 345 UNIX Timestamp 2022-06-14 23:58:50 329 secs
9 125 345 UNIX Timestamp 2022-06-14 23:59:45 329 secs
10 125 345 UNIX Timestamp 2022-06-15 00:00:32 329 secs
11 125 345 UNIX Timestamp 2022-06-15 00:00:59 329 secs