I am looking for a solution in dplyr
.
Let's say I have this dataframe p
id treatment age sex progression pfs
1 1 SSTR 31.3 0 0 15.6
2 2 SSTR 36.9 0 1 8.9
3 3 SSTR 44.6 1 1 25.5
Each patient (id
) received a treatment and was followed until progression p$progression == 1
or censored progression-free p$progression == 0
with p$pfs
as follow-up time (in months).
I want to split each row into multiple intervals for every 12-months follow-up.
Some covariates does not change in each new row, while others should change.
Unchanged
p$id
p$treatment
p$sex
(and others in my dataframe)
Covariates that change
p$age
should increase by 1 for every new time interval (since each patient are getting 1-year old per 12 months)p$pfs
are split into two new covariates:p$start
andp$stop
indicating the 12-months intervals- A new covariate
p$interval
is created to indicate the number of intervals (0 - 12 months isinterval == 1
, 12 - 24 months isinterval == 2
, 24 - 36 months isinterval == 3
and so on)
Expected output
id treatment sex age progression start stop interval
1 1 SSTR 0 31.3 0 0 12.0 1
2 1 SSTR 0 32.3 0 12 15.6 2
3 2 SSTR 0 36.9 1 0 8.9 1
4 3 SSTR 1 44.6 0 0 12.0 1
5 3 SSTR 1 45.6 0 12 24.0 2
6 3 SSTR 1 46.6 1 24 25.5 3
Data
p <- structure(list(id = 1:3, treatment = structure(c(1L, 1L, 1L), levels = c("SSTR",
"SSA", "Control"), class = "factor"), age = c(31.3, 36.9, 44.6
), sex = structure(c(1L, 1L, 2L), levels = c("0", "1"), class = "factor"),
progression = c(0L, 1L, 1L), pfs = c(15.6, 8.9, 25.5)), row.names = c(NA,
3L), class = "data.frame")
CodePudding user response:
Here's a solution using a list
column with unnest
:
library(dplyr)
library(purrr)
library(tidyr)
p %>%
mutate(interval = map(pfs %/% 12L 1L, seq_len)) %>%
unnest(interval) %>%
mutate(start = 12L * (interval - 1L),
stop = pmin(pfs, 12L * interval),
age = age (interval - 1L)) %>%
group_by(id) %>%
mutate(progression = if_else(interval != max(interval), 0L, progression)) %>%
select(id, treatment, sex, age, progression, start, stop, interval)
# # A tibble: 6 × 8
# # Groups: id [3]
# id treatment sex age progression start stop interval
# <int> <fct> <fct> <dbl> <int> <int> <dbl> <int>
# 1 1 SSTR 0 31.3 0 0 12 1
# 2 1 SSTR 0 32.3 0 12 15.6 2
# 3 2 SSTR 0 36.9 1 0 8.9 1
# 4 3 SSTR 1 44.6 0 0 12 1
# 5 3 SSTR 1 45.6 0 12 24 2
# 6 3 SSTR 1 46.6 1 24 25.5 3
Idea is that you create a list column with interval counters first and unnest
them then (i.e. expand it). Here is the solution in slow-mo:
p %>%
mutate(interval = map(pfs %/% 12L 1L, seq_len))
# id treatment age sex progression pfs interval
# 1 1 SSTR 31.3 0 0 15.6 1, 2
# 2 2 SSTR 36.9 0 1 8.9 1
# 3 3 SSTR 44.6 1 1 25.5 1, 2, 3
p %>%
mutate(interval = map(pfs %/% 12L 1L, seq_len)) %>%
unnest(interval)
# # A tibble: 6 × 7
# id treatment age sex progression pfs interval
# <int> <fct> <dbl> <fct> <int> <dbl> <int>
# 1 1 SSTR 31.3 0 0 15.6 1
# 2 1 SSTR 31.3 0 0 15.6 2
# 3 2 SSTR 36.9 0 1 8.9 1
# 4 3 SSTR 44.6 1 1 25.5 1
# 5 3 SSTR 44.6 1 1 25.5 2
# 6 3 SSTR 44.6 1 1 25.5 3
The rest is rather straight forward.