Home > front end >  Handling missing values R
Handling missing values R

Time:03-09

I've used group_by function in R, as :

data = r %>%
  group_by(Name, yp) %>%
  summarise(nb = n()) %>%
  mutate(Frac = nb / sum(nb))

This is what I get

Name   yp    nb    Frac

0_S     0    1   0.03030303
0_S     1    20  0.60606061
0_S     2    12  0.36363636
1_S     1    16  0.59259259
1_S     2    11  0.40740741    

But for each item in Name (each time 3 : 0,1,2), when there is no item in the previous table, I get a missing value instead of a 0. So, here is what I would like (adding 1_S 0 row) for example if 0 is missing in yp.

 Name   yp   nb    Frac

0_S     0    1   0.03030303
0_S     1    20  0.60606061
0_S     2    12  0.36363636

1_S     0    0   0

1_S     1    16  0.59259259
1_S     2    11  0.40740741

Reproducible example :

Df <- data.frame(A = c('0_S','0_S','0_S','0_S','0_S','0_S','1_S','1_S','1_S','1_S','1_S','1_S'),
                 B = c(0,0,1,1,2,2,1,1,1,1,2,2),
                 C = c(0,0,1,1,2,2,0,0,1,1,2,2))
Df

DDf = Df %>%
  group_by(A,B) %>%
  summarise(n = n()) %>%
  mutate(Frac = n / sum(n))

head(DDf)

CodePudding user response:

You can use tidyr::complete:

library(tidyverse)

DDf %>%
  ungroup() %>% 
  complete(A, B, fill = list(n = 0, Frac = 0)

# A tibble: 6 x 4
  A         B     n  Frac
  <chr> <dbl> <dbl> <dbl>
1 0_S       0     2 0.333
2 0_S       1     2 0.333
3 0_S       2     2 0.333
4 1_S       0     0 0    
5 1_S       1     4 0.667
6 1_S       2     2 0.333

data

Df <- data.frame(A = c('0_S','0_S','0_S','0_S','0_S','0_S','1_S','1_S','1_S','1_S','1_S','1_S'),
                 B = c(0,0,1,1,2,2,1,1,1,1,2,2),
                 C = c(0,0,1,1,2,2,0,0,1,1,2,2))
DDf = Df %>%
  group_by(A,B) %>%
  summarise(n = n()) %>%
  mutate(Frac = n / sum(n))
  • Related