Expand dataframe by adding rows and values based on numbers in dataframe-CodePudding

I have event data on a population level, ie for each day there is a sum of individuals with events and individuals censored. I would like to expand this data to a more traditional format for survival analysis, ie each individual get a row. So for each day, a number of rows need to added for the number of events (with events = 1 and censor = 0) and for the number of censor (with events = 0 and censor = 1). Below is an example of an input data frame (dataIn) and of the desired output.

days <- c(1,2,3)
event <- c(2,2,0)
censor <- c(0,2,2)
dataIn <- data.frame(days, event, censor)

  days event censor
    1     2      0
    2     2      2
    3     0      2

  days event censor
    1     1      0
    1     1      0
    2     1      0
    2     1      0
    2     0      1
    2     0      1
    3     0      1
    3     0      1

CodePudding user response：

Here's a fairly pedestrian but effective way of doing it using rep:

with(dataIn, data.frame(day    = c(rep(days, event), rep(days, censor)), 
                        event  = rep(c(1, 0), c(sum(event), sum(censor))),
                        censor = rep(c(0, 1), c(sum(event), sum(censor)))))
#>   day event censor
#> 1   1     1      0
#> 2   1     1      0
#> 3   2     1      0
#> 4   2     1      0
#> 5   2     0      1
#> 6   2     0      1
#> 7   3     0      1
#> 8   3     0      1

CodePudding user response：

pmap allows us to apply a function to each row (day). Then, we can rely on vector recycling to fill the zeros and the days. Note that bind_rows(tibble(), tibble()) does not throw an error.

pmap_dfr(dataIn, ~ list(
  tibble(days = ..1, event = rep(1, ..2), censor = 0),
  tibble(days = ..1, event = 0, censor = rep(1, ..3))
  )
)

# A tibble: 8 x 3
   days event censor
  <dbl> <dbl>  <dbl>
1     1     1      0
2     1     1      0
3     2     1      0
4     2     1      0
5     2     0      1
6     2     0      1
7     3     0      1
8     3     0      1

CodePudding user response：

We could use uncount

library(dplyr)
library(tidyr)
dataIn %>%
    uncount(event   censor) %>% 
    mutate(across(event:censor, ~  (. > 0)))

-output

   days event censor
1    1     1      0
2    1     1      0
3    2     1      1
4    2     1      1
5    2     1      1
6    2     1      1
7    3     0      1
8    3     0      1