person_id diag_date concept_id event diff_prev_event
1: 1 2012-01-15 4265600 comorb NA secs
2: 1 2012-01-15 201820 comorb 0 secs
3: 1 2012-03-15 4265600 comorb 5184000 secs
4: 2 2012-03-15 201820 comorb NA secs
5: 2 2012-06-22 201820 comorb 8553600 secs
6: 2 2012-06-22 4265600 comorb 0 secs
I am trying to calculate the days since last event for each person. I am running into two issues.
- The time difference is showing in seconds. I need to get the days. (5184000 secs = 30 days)
- If two days are same dates then the 2nd one is showing 0 when it should be looking at the different date. Row 5 and 6 is same date so they would have the same date difference.
This is the code I tried:
dt[order(diag_date),diff_prev_event := difftime(diag_date, lag( diag_date)), by = c("person_id") ]
CodePudding user response:
Specify the units
library(data.table)
dt[order(diag_date),diff_prev_event := difftime(diag_date,
lag( diag_date), units = 'days'), by = c("person_id") ]
Then, we grouped by 'person_id' and 'diag_date' and change the values to the max
if there are more than one row
dt[, diff_prev_event := if(.N > 1) max(diff_prev_event,
na.rm = TRUE) else diff_prev_event, .(person_id, diag_date)]
> dt
person_id diag_date concept_id event diff_prev_event
<int> <Date> <int> <char> <difftime>
1: 1 2012-01-15 4265600 comorb 0 days
2: 1 2012-01-15 201820 comorb 0 days
3: 1 2012-03-15 4265600 comorb 60 days
4: 2 2012-03-15 201820 comorb NA days
5: 2 2012-06-22 201820 comorb 99 days
6: 2 2012-06-22 4265600 comorb 99 days
-output
data
dt <- structure(list(person_id = c(1L, 1L, 1L, 2L, 2L, 2L), diag_date = structure(c(15354,
15354, 15414, 15414, 15513, 15513), class = "Date"), concept_id = c(4265600L,
201820L, 4265600L, 201820L, 201820L, 4265600L), event = c("comorb",
"comorb", "comorb", "comorb", "comorb", "comorb")), row.names = c(NA,
-6L), class = c("data.table", "data.frame"))