Expand table of counts to a dataframe-CodePudding

Given a table of counts specified in 'dat' I would like to create a dataframe with 3 columns (race, grp and outcome) and 206 rows. The variable outcome would be 1 if for ascertained, and 0 if 'missed'.

dat <- structure(list(race = structure(c(1L, 2L, 1L, 2L), levels = c("black", 
"nonblack"), class = "factor"), grp = structure(c(1L, 1L, 2L, 
2L), levels = c("hbpm", "uc"), class = "factor"), ascertained = c(63, 
32, 24, 21), missed = c(5, 3, 49, 9), total = c(68, 35, 73, 30
)), class = "data.frame", row.names = c(NA, -4L))

CodePudding user response：

1) For each row set race in the output to that race, grp in the output to that group and then generate the appropriate number of 1s and 0s for outcome. The result is 206 x 3.

library(dplyr)

dat %>%
  rowwise %>%
  summarize(race = race, grp = grp, outcome = rep(1:0, c(ascertained, missed)))

2) In the example data there are no duplicate race/grp and if that is true in general then it can alternately be written as::

dat %>%
  group_by(race, grp) %>%
  summarize(outcome = rep(1:0, c(ascertained, missed)), .groups = "drop")

3) A base R solution would be the following. If each combination of race/grp occurs on only one row of the input then 1:nrow(dat) could optionally be replaced with dat[1:2].

do.call("rbind", 
  by(dat, 
     1:nrow(dat), 
     with, 
     data.frame(race = race, grp = grp, outcome = rep(1:0, c(ascertained, missed)))
  )
)

CodePudding user response：

How about this:

library(tidyverse)
dat <- structure(list(race = structure(c(1L, 2L, 1L, 2L), levels = c("black", 
                                                                     "nonblack"), class = "factor"), grp = structure(c(1L, 1L, 2L, 
                                                                                                                       2L), levels = c("hbpm", "uc"), class = "factor"), ascertained = c(63, 
                                                                                                                                                                                         32, 24, 21), missed = c(5, 3, 49, 9), total = c(68, 35, 73, 30
                                                                                                                                                                                         )), class = "data.frame", row.names = c(NA, -4L))
dat2 <- dat %>% select(-total) %>% 
  pivot_longer(c(ascertained, missed), names_to = "var", values_to="vals") %>% 
  uncount(vals) %>% 
  mutate(outcome = case_when(var == "ascertained" ~ 1, 
                             TRUE ~ 0)) %>% 
  select(-var)
head(dat2)
#> # A tibble: 6 × 3
#>   race  grp   outcome
#>   <fct> <fct>   <dbl>
#> 1 black hbpm        1
#> 2 black hbpm        1
#> 3 black hbpm        1
#> 4 black hbpm        1
#> 5 black hbpm        1
#> 6 black hbpm        1

dat2 %>% 
  group_by(race, grp, outcome) %>% 
  tally()
#> # A tibble: 8 × 4
#> # Groups:   race, grp [4]
#>   race     grp   outcome     n
#>   <fct>    <fct>   <dbl> <int>
#> 1 black    hbpm        0     5
#> 2 black    hbpm        1    63
#> 3 black    uc          0    49
#> 4 black    uc          1    24
#> 5 nonblack hbpm        0     3
#> 6 nonblack hbpm        1    32
#> 7 nonblack uc          0     9
#> 8 nonblack uc          1    21

CodePudding user response：

This is based partially on the linked question from Limey in the comments:

library(tidyverse)

bind_rows(
  dat %>% uncount(ascertained) %>% mutate(outcome = 1) %>% select(-missed, -total), 
  dat %>% uncount(missed) %>% mutate(outcome = 0) %>% select(-ascertained, -total)
)

CodePudding user response：

Here is a relatively simple answer that is based on, in part, the answer suggested in a comment, but adapted to work for your problem, since you need multiple "uncounts". This answer uses function from the packages tibble, dplyr, and tidyr. These are all in the tidyverse.
The exact method is to create two sub-lists, one listing out the "ascertained", and one listing out the "missed", formatting the ascertained column as you wanted, and then mashing these two together with a basic tibble::add_row.
The relevant code is:

 library(tidyverse)
    dat2 <- uncount(dat, ascertained, .remove = F) %>%
            mutate(ascertained = 1) %>%
            select(-missed)
    dat3 <- uncount(dat, missed, .remove = T) %>% 
            mutate(ascertained = 0)
    dat4 <- add_row(dat2, dat3) %>% select(-total) %>%
            rename(outcome = ascertained)

dat4 should be the data as you asked for it. I would suggest also generating an id column to make things easier to work with, but obviously that is up to you.