I have a dataset where each row is a monthly observation of patients, monitoring whether they tested positive to a disease (status).
I know which month (i.e. row number for each ID - TimeToDx) they were diagnosed, and what I would like to do is have a binary indicator that switches from 0 to 1, starting from the observation month indicated in TimeToDx.
Basically I need to replicate 0 by the TimeToDx - 1, then for the remainder of the rows replicate 1 for each ID.
Here is some example data - without the status indicator filled:
ID TimeToDx Status
10425 2
10425 2
10425 2
10425 2
10667 3
10667 3
10667 3
10667 3
10667 3
10686 2
10686 2
10686 2
10686 2
10686 2
17096 5
17096 5
17096 5
17096 5
17096 5
Here is what I would like to see:
ID TimeToDx Status
10425 2 0
10425 2 1
10425 2 1
10425 2 1
10667 3 0
10667 3 0
10667 3 1
10667 3 1
10667 3 1
10686 2 0
10686 2 1
10686 2 1
10686 2 1
10686 2 1
17096 5 0
17096 5 0
17096 5 0
17096 5 0
17096 5 1
Any help would be much appreciated.
CodePudding user response:
Here's an approach with dplyr
. Grouping within each ID, we compare the row within that group to the TimeToDx
. TRUE x 1 = 1, FALSE x 1 = 0. Could alternately use mutate(Status = if_else(row_number() >= TimeToDx, 1, 0))
.
library(dplyr)
df %>%
group_by(ID) %>%
mutate(Status = 1 * (row_number() >= TimeToDx)) %>%
ungroup()