Consider the following sample dataset.
*id represents an individual's identifier.
*Surv_time represents an individual's survival time
*start represents the time at which zj is measured. zj is a time-varying covariate.
rm(list=ls()); set.seed(1)
n<-5
Surv_time<-round( runif( n, 12 , 20 ) ) #Survival time
dat<-data.frame(id=1:n, Surv_time )
ntp<- rep(3, n) # three measurements per individual.
mat<-matrix(ncol=2,nrow=1)
m=0; w <- mat
for(l in ntp)
{
m=m 1
ft<- seq(from = runif(1,0,8), to = runif(1,12,20) , length.out = l)
seq<-round(ft)
matid<-cbind( matrix(seq,ncol=1 ) ,m)
w<-rbind(w,matid)
}
d<-data.frame(w[-1,])
colnames(d)<-c("start","id")
D <- merge(d,dat,by="id") #merging dataset
D$zj <- with(D, 0.3*start)
D
id start Surv_time zj
1 1 7 14 2.1
2 1 13 14 3.9
3 1 20 14 6.0
4 2 5 15 1.5
5 2 11 15 3.3
6 2 17 15 5.1
7 3 0 17 0.0
8 3 7 17 2.1
9 3 14 17 4.2
10 4 1 19 0.3
11 4 9 19 2.7
12 4 17 19 5.1
13 5 3 14 0.9
14 5 11 14 3.3
15 5 18 14 5.4
I need a code to transform the data to the start-stop format where the last stop is at Surv_time for an individual. The idea is to create start-stop intervals where the stop of an interval is the start of the next interval. I should end up with
id start stop Surv_time zj
1 1 7 13 14 2.1
2 1 13 14 14 3.9
4 2 5 11 15 1.5
5 2 11 15 15 3.3
7 3 0 7 17 0.0
8 3 7 14 17 2.1
9 3 14 17 17 4.2
10 4 1 9 19 0.3
11 4 9 17 19 2.7
12 4 17 19 19 5.1
13 5 3 11 14 0.9
14 5 11 14 14 3.3
CodePudding user response:
We can use dplyr:
library(dplyr)
D %>% group_by(id) %>%
mutate(stop = lead(start, default = Inf),
stop = ifelse(stop > Surv_time, Surv_time, stop), .after = start) %>%
filter(start < stop) %>%
ungroup()
# A tibble: 12 × 5
id start stop Surv_time zj
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 7 13 14 2.1
2 1 13 14 14 3.9
3 2 5 11 15 1.5
4 2 11 15 15 3.3
5 3 0 7 17 0
6 3 7 14 17 2.1
7 3 14 17 17 4.2
8 4 1 9 19 0.3
9 4 9 17 19 2.7
10 4 17 19 19 5.1
11 5 3 11 14 0.9
12 5 11 14 14 3.3
CodePudding user response:
This might not be the most elegant solution, but it should work
library(tidyverse)
D <- D %>%
mutate(stop = c(start[2:nrow(D)],NA)) %>%
filter(start<=Surv_time)
D$stop[D$stop > D$Surv_time |D$stop < D$start] <- D$Surv_time[D$stop > D$Surv_time|D$stop < D$start]
D <- D %>% select(id, start, stop, Surv_time, zj)