Transform data to long format in R given survival time-CodePudding

Consider the following sample dataset.

*id represents an individual's identifier.

*Surv_time represents an individual's survival time

*start represents the time at which zj is measured. zj is a time-varying covariate.

rm(list=ls()); set.seed(1)
n<-5
Surv_time<-round( runif( n, 12 , 20  ) ) #Survival time
dat<-data.frame(id=1:n, Surv_time )
ntp<- rep(3, n) # three measurements per individual. 
mat<-matrix(ncol=2,nrow=1)
m=0; w <- mat
for(l in ntp)
{
  m=m 1
  ft<- seq(from = runif(1,0,8), to =  runif(1,12,20)  , length.out = l)
  seq<-round(ft)
  matid<-cbind( matrix(seq,ncol=1 ) ,m)
  w<-rbind(w,matid)
}

d<-data.frame(w[-1,])
colnames(d)<-c("start","id")
D <-  merge(d,dat,by="id") #merging dataset
D$zj <- with(D, 0.3*start)
D
   id start Surv_time  zj
1   1     7        14 2.1
2   1    13        14 3.9
3   1    20        14 6.0
4   2     5        15 1.5
5   2    11        15 3.3
6   2    17        15 5.1
7   3     0        17 0.0
8   3     7        17 2.1
9   3    14        17 4.2
10  4     1        19 0.3
11  4     9        19 2.7
12  4    17        19 5.1
13  5     3        14 0.9
14  5    11        14 3.3
15  5    18        14 5.4

I need a code to transform the data to the start-stop format where the last stop is at Surv_time for an individual. The idea is to create start-stop intervals where the stop of an interval is the start of the next interval. I should end up with

  id start    stop  Surv_time  zj 
1   1     7    13     14       2.1    
2   1    13    14     14       3.9   

4   2     5    11     15       1.5    
5   2    11    15     15       3.3   

7   3     0    7      17       0.0    
8   3     7    14     17       2.1    
9   3    14    17     17       4.2   

10  4     1    9      19       0.3    
11  4     9    17     19       2.7    
12  4    17    19     19       5.1   

13  5     3    11     14       0.9    
14  5    11    14     14       3.3

CodePudding user response：

We can use dplyr:

library(dplyr)

D %>% group_by(id) %>%
  mutate(stop = lead(start, default = Inf),
         stop = ifelse(stop > Surv_time, Surv_time, stop), .after = start) %>%
  filter(start < stop) %>%
  ungroup()

# A tibble: 12 × 5
      id start  stop Surv_time    zj
   <dbl> <dbl> <dbl>     <dbl> <dbl>
 1     1     7    13        14   2.1
 2     1    13    14        14   3.9
 3     2     5    11        15   1.5
 4     2    11    15        15   3.3
 5     3     0     7        17   0  
 6     3     7    14        17   2.1
 7     3    14    17        17   4.2
 8     4     1     9        19   0.3
 9     4     9    17        19   2.7
10     4    17    19        19   5.1
11     5     3    11        14   0.9
12     5    11    14        14   3.3

CodePudding user response：

This might not be the most elegant solution, but it should work

library(tidyverse)

D <- D %>% 
  mutate(stop = c(start[2:nrow(D)],NA)) %>% 
  filter(start<=Surv_time)

D$stop[D$stop > D$Surv_time |D$stop < D$start] <- D$Surv_time[D$stop > D$Surv_time|D$stop < D$start]

D <- D %>% select(id, start, stop, Surv_time, zj)