Home > Blockchain >  Missing combinations when collapsing a data.frame [because 0 occurrences]
Missing combinations when collapsing a data.frame [because 0 occurrences]

Time:05-03

Let suppose we have a big data.frame named df with three different variables:

  • Gender: which can be M or F (2 possible answers)
  • Hair: which can be "black", "brown", "blond", "red", "other" (5 possible values)
  • Sport: which can be "yes" or "no" (2 different values)
  • Value: always 1 in order to count the number of events

When I use the collap function from collapse package I run the following code

collap (df, ~ Gender   Hair   Sport, FUN = sum, cols ="Value")

What I expect is a data.frame with 20 different rows (one per each combination); however, if there is a combination with no occurrences, the row does not appear.

Do you know how can I get all the possible combinations with a 0 in case there are no events with the required values?

CodePudding user response:

You can complete unused factor levels like this, resulting in a row for the females despite all rows in the data are males:

library(tidyverse)
library(collapse)
#> collapse 1.7.6, see ?`collapse-package` or ?`collapse-documentation`
#> 
#> Attaching package: 'collapse'
#> The following object is masked from 'package:stats':
#> 
#>     D

data <- tribble(
  ~Gender, ~Hair, ~Value,
  "M", "black", 1
)

data %>%
  mutate(Gender = Gender %>% factor(levels = c("M", "F"))) %>%
  complete(Gender, fill = list(Value = 0)) %>%
  collap(~ Gender   Hair, FUN = sum, cols = "Value")
#> # A tibble: 2 × 3
#>   Gender Hair  Value
#>   <fct>  <chr> <dbl>
#> 1 M      black     1
#> 2 F      <NA>      0

Created on 2022-05-03 by the reprex package (v2.0.0)

CodePudding user response:

This is the answer to my question based on the response by @danloo

df %<%
   complete(Gender, Hair, Sport) %>%
   collap( ~Gender   Hair   Sport, FUN = sum, cols = "Value")

Running that I get a data.frame with 20 different rows where NA are placed for those combinations with no events.

  • Related