I have event data on a population level, ie for each day there is a sum of individuals with events and individuals censored. I would like to expand this data to a more traditional format for survival analysis, ie each individual get a row. So for each day, a number of rows need to added for the number of events (with events = 1 and censor = 0) and for the number of censor (with events = 0 and censor = 1). Below is an example of an input data frame (dataIn
) and of the desired output.
days <- c(1,2,3)
event <- c(2,2,0)
censor <- c(0,2,2)
dataIn <- data.frame(days, event, censor)
days event censor
1 2 0
2 2 2
3 0 2
days event censor
1 1 0
1 1 0
2 1 0
2 1 0
2 0 1
2 0 1
3 0 1
3 0 1
CodePudding user response:
Here's a fairly pedestrian but effective way of doing it using rep
:
with(dataIn, data.frame(day = c(rep(days, event), rep(days, censor)),
event = rep(c(1, 0), c(sum(event), sum(censor))),
censor = rep(c(0, 1), c(sum(event), sum(censor)))))
#> day event censor
#> 1 1 1 0
#> 2 1 1 0
#> 3 2 1 0
#> 4 2 1 0
#> 5 2 0 1
#> 6 2 0 1
#> 7 3 0 1
#> 8 3 0 1
CodePudding user response:
pmap
allows us to apply a function to each row (day). Then, we can rely on vector recycling to fill the zeros and the days. Note that bind_rows(tibble(), tibble())
does not throw an error.
pmap_dfr(dataIn, ~ list(
tibble(days = ..1, event = rep(1, ..2), censor = 0),
tibble(days = ..1, event = 0, censor = rep(1, ..3))
)
)
# A tibble: 8 x 3
days event censor
<dbl> <dbl> <dbl>
1 1 1 0
2 1 1 0
3 2 1 0
4 2 1 0
5 2 0 1
6 2 0 1
7 3 0 1
8 3 0 1
CodePudding user response:
We could use uncount
library(dplyr)
library(tidyr)
dataIn %>%
uncount(event censor) %>%
mutate(across(event:censor, ~ (. > 0)))
-output
days event censor
1 1 1 0
2 1 1 0
3 2 1 1
4 2 1 1
5 2 1 1
6 2 1 1
7 3 0 1
8 3 0 1