Say I have the following data.table
:
library(data.table)
DT <- data.table(
ID = rep(c(1,2,3),4),
day = c(rep(1,3),rep(2,3),rep(3,3),rep(4,3)),
Status = c(rep('A',3),'A','B','B','A','C','B','A','D','C')
)
What I would like to achieve is that for each ID
, find number of observations (in this case if sorted by days
, the number of day
it takes to hit a specific Status
. So if I need to do this for Status C
, the result would be:
0
for ID 1
(since doesn't contain status C), 3
for ID 2
, and 4
for ID 3
.
The only way came to my mind was to write a function and do nested for loops
, but I am sure there should be much better/faster/more efficient ways.
Appreciate any help.
CodePudding user response:
A possible data.table
approach adding one column for the number of days to reach each status
(0 if never reached):
library(data.table)
## status id's
status_ids <- unique(DT$Status)
status_cols <- paste("status", status_ids, sep = "_")
## add one column for each status id
setorder(DT, ID, day)
DT[, (status_cols) := lapply(status_ids, \(s) ifelse(any(Status == s), min(day[Status == s]), 0)), by = "ID"]
DT
#> ID day Status status_A status_B status_C status_D
#> 1: 1 1 A 1 0 0 0
#> 2: 1 2 A 1 0 0 0
#> 3: 1 3 A 1 0 0 0
#> 4: 1 4 A 1 0 0 0
#> 5: 2 1 A 1 2 3 4
#> 6: 2 2 B 1 2 3 4
#> 7: 2 3 C 1 2 3 4
#> 8: 2 4 D 1 2 3 4
#> 9: 3 1 A 1 2 4 0
#> 10: 3 2 B 1 2 4 0
#> 11: 3 3 B 1 2 4 0
#> 12: 3 4 C 1 2 4 0
CodePudding user response:
You can split
by ID
and return the first match
of day.
sapply(split(DT[,2:3], DT$ID), \(x) x$day[match("C", x$Status)])
# 1 2 3
#NA 3 4
CodePudding user response:
Does this work:
library(dplyr)
DT %>% left_join(
DT %>% group_by(ID) %>% summarise(col = row_number()[Status == 'C'])
) %>% replace_na(list(col= 0))
`summarise()` has grouped output by 'ID'. You can override using the
`.groups` argument.
Joining, by = "ID"
ID day Status col
1: 1 1 A 0
2: 2 1 A 3
3: 3 1 A 4
4: 1 2 A 0
5: 2 2 B 3
6: 3 2 B 4
7: 1 3 A 0
8: 2 3 C 3
9: 3 3 B 4
10: 1 4 A 0
11: 2 4 D 3
12: 3 4 C 4