Why does survfit (Survival R package) treat each row as a separate individual?-CodePudding

Using the survival package in R, we can use the "heart" dataset:

survfit(Surv(stop, event) ~ transplant, data = heart)

This outputs a model has n=172 (103 in the transplant=1 group; and 69 in the transplant=1 group) and 75 events (30 in treatment=0; 45 in treatment=1).

And if we plot the K-M curve with survminer package:

ggsurvplot(survfit(Surv(stop, event) ~ transplant, data = heart), risk.table = "nrisk_cumcensor", xlim=c(0,5*365), break.x.by = 365, conf.int=TRUE)

It shows that there are 103 and 69 individuals at risk to start with in each transplant group.

However, there are only 103 individuals in total (length(unique(heart$id))), not 172.

Trying to force the id with either "id" or "cluster" (eg survfit(Surv(stop, event) ~ transplant, id=id, cluster=id, data = heart)) doesn't change the result.

How can we make the model understand there are multiple lines for each individual?

CodePudding user response：

For this I would recommend looking into time-dependent cox regression, there is a good vignette in the survival package (https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf)

There are several ways you can account for the multiple observations per patient, the simplest way with time-dependent cox regression will assume that the covariates are constant until the next observation. In this case for each observation, you define a start and stop time (until the next observation) and a status indicator that indicates if an event occurred during that time. The data would look similar to:

     id time1 time2 status
1     1     0    30      0
2     1    30   100      1

And the cox-regression could then take the form

 coxph(Surv(time1, time2, status) ~ ., cluster = id data=df)

There are other more sophisticated methods for these analyses such as using multivariate models (so-called Joint-Models), for which there are other packages such as JM, https://cran.r-project.org/web/packages/JM/JM.pdf.