Let suppose we have a big data.frame
named df
with three different variables:
Gender
: which can be M or F (2 possible answers)Hair
: which can be "black", "brown", "blond", "red", "other" (5 possible values)Sport
: which can be "yes" or "no" (2 different values)Value
: always 1 in order to count the number of events
When I use the collap
function from collapse
package I run the following code
collap (df, ~ Gender Hair Sport, FUN = sum, cols ="Value")
What I expect is a data.frame
with 20 different rows (one per each combination); however, if there is a combination with no occurrences, the row does not appear.
Do you know how can I get all the possible combinations with a 0 in case there are no events with the required values?
CodePudding user response:
You can complete unused factor levels like this, resulting in a row for the females despite all rows in the data are males:
library(tidyverse)
library(collapse)
#> collapse 1.7.6, see ?`collapse-package` or ?`collapse-documentation`
#>
#> Attaching package: 'collapse'
#> The following object is masked from 'package:stats':
#>
#> D
data <- tribble(
~Gender, ~Hair, ~Value,
"M", "black", 1
)
data %>%
mutate(Gender = Gender %>% factor(levels = c("M", "F"))) %>%
complete(Gender, fill = list(Value = 0)) %>%
collap(~ Gender Hair, FUN = sum, cols = "Value")
#> # A tibble: 2 × 3
#> Gender Hair Value
#> <fct> <chr> <dbl>
#> 1 M black 1
#> 2 F <NA> 0
Created on 2022-05-03 by the reprex package (v2.0.0)
CodePudding user response:
This is the answer to my question based on the response by @danloo
df %<%
complete(Gender, Hair, Sport) %>%
collap( ~Gender Hair Sport, FUN = sum, cols = "Value")
Running that I get a data.frame
with 20 different rows where NA
are placed for those combinations with no events.