Home > Mobile >  Complete AND expand missing data in two columns
Complete AND expand missing data in two columns

Time:05-05

I have two columns I'm trying to complete and expand at the same time. Here's a sample data set.

library(tibble)
library(dplyr)
library(tidyr)    

# Sample data
df <- tibble(
  type = c("apple", "apple", "apple", "orange", "orange", "orange", "pear", "pear"),
  year = c(2010, 2011, 2012, 2010, 2011, 2012, 2010, 2012),
  val = c(1:8))

df
# A tibble: 8 x 3
  type    year   val
  <chr>  <dbl> <int>
1 apple   2010     1
2 apple   2011     2
3 apple   2012     3
4 orange  2010     4
5 orange  2011     5
6 orange  2012     6
7 pear    2010     7
8 pear    2012     8

First, the type "pear" is missing the year "2011". Additionally, type is missing one value that COULD be in the data set but currently isn't. This missing value of type is "banana". I'd like to include "banana" while also filling in the missing years (2010:2012) associated with all the types.

As of now, I can only do one or the other. I'd think there's a way to do both. The problem with the fill argument in complete() is that it only allows a single value to fill in missing elements.

# Want to complete and expand
# Missing year 2011 in "pear" type and missing "banana" type so want to include and fill years 2010:2012

# complete
df %>% 
    complete(type = c("apple", "orange", "pear", "banana"), 
             fill = list(val = 0))
# A tibble: 9 x 3
  type    year   val
  <chr>  <dbl> <int>
1 apple   2010     1
2 apple   2011     2
3 apple   2012     3
4 banana    NA     0
5 orange  2010     4
6 orange  2011     5
7 orange  2012     6
8 pear    2010     7
9 pear    2012     8

# expand
df %>% 
    expand(type = c("apple", "orange", "pear", "banana"), year)
# A tibble: 12 x 2
   type    year
   <chr>  <dbl>
 1 apple   2010
 2 apple   2011
 3 apple   2012
 4 banana  2010
 5 banana  2011
 6 banana  2012
 7 orange  2010
 8 orange  2011
 9 orange  2012
10 pear    2010
11 pear    2011
12 pear    2012

My expected output is:

# A tibble: 12 x 3
   type    year   val
   <chr>  <dbl> <dbl>
 1 apple   2010     1
 2 apple   2011     2
 3 apple   2012     3
 4 orange  2010     4
 5 orange  2011     5
 6 orange  2012     6
 7 pear    2010     7
 8 pear    2011     0
 9 pear    2012     8
10 banana  2010     0
11 banana  2011     0
12 banana  2012     0

I could reference df twice like the following, but I'd like to find a way to not have to do that if possible.

df %>% 
    expand(type = c("apple", "orange", "pear", "banana"), year) %>% 
    left_join(df, by = c("type", "year")) %>% 
    mutate(val = replace_na(val, 0))
# A tibble: 12 x 3
   type    year   val
   <chr>  <dbl> <int>
 1 apple   2010     1
 2 apple   2011     2
 3 apple   2012     3
 4 banana  2010     0
 5 banana  2011     0
 6 banana  2012     0
 7 orange  2010     4
 8 orange  2011     5
 9 orange  2012     6
10 pear    2010     7
11 pear    2011     0
12 pear    2012     8

CodePudding user response:

Make type a factor with banana as a level, then complete will work as you expect:

library(dplyr)
library(tidyr)

df %>%
  mutate(type = factor(type, levels = c(unique(type), "banana"))) %>%
  complete(type, year, fill = list(val = 0))

# A tibble: 12 × 3
   type    year   val
   <fct>  <dbl> <int>
 1 apple   2010     1
 2 apple   2011     2
 3 apple   2012     3
 4 orange  2010     4
 5 orange  2011     5
 6 orange  2012     6
 7 pear    2010     7
 8 pear    2011     0
 9 pear    2012     8
10 banana  2010     0
11 banana  2011     0
12 banana  2012     0
  • Related