I have two columns I'm trying to complete and expand at the same time. Here's a sample data set.
library(tibble)
library(dplyr)
library(tidyr)
# Sample data
df <- tibble(
type = c("apple", "apple", "apple", "orange", "orange", "orange", "pear", "pear"),
year = c(2010, 2011, 2012, 2010, 2011, 2012, 2010, 2012),
val = c(1:8))
df
# A tibble: 8 x 3
type year val
<chr> <dbl> <int>
1 apple 2010 1
2 apple 2011 2
3 apple 2012 3
4 orange 2010 4
5 orange 2011 5
6 orange 2012 6
7 pear 2010 7
8 pear 2012 8
First, the type
"pear" is missing the year "2011". Additionally, type
is missing one value that COULD be in the data set but currently isn't. This missing value of type
is "banana". I'd like to include "banana" while also filling in the missing years (2010:2012
) associated with all the types.
As of now, I can only do one or the other. I'd think there's a way to do both. The problem with the fill
argument in complete()
is that it only allows a single value to fill in missing elements.
# Want to complete and expand
# Missing year 2011 in "pear" type and missing "banana" type so want to include and fill years 2010:2012
# complete
df %>%
complete(type = c("apple", "orange", "pear", "banana"),
fill = list(val = 0))
# A tibble: 9 x 3
type year val
<chr> <dbl> <int>
1 apple 2010 1
2 apple 2011 2
3 apple 2012 3
4 banana NA 0
5 orange 2010 4
6 orange 2011 5
7 orange 2012 6
8 pear 2010 7
9 pear 2012 8
# expand
df %>%
expand(type = c("apple", "orange", "pear", "banana"), year)
# A tibble: 12 x 2
type year
<chr> <dbl>
1 apple 2010
2 apple 2011
3 apple 2012
4 banana 2010
5 banana 2011
6 banana 2012
7 orange 2010
8 orange 2011
9 orange 2012
10 pear 2010
11 pear 2011
12 pear 2012
My expected output is:
# A tibble: 12 x 3
type year val
<chr> <dbl> <dbl>
1 apple 2010 1
2 apple 2011 2
3 apple 2012 3
4 orange 2010 4
5 orange 2011 5
6 orange 2012 6
7 pear 2010 7
8 pear 2011 0
9 pear 2012 8
10 banana 2010 0
11 banana 2011 0
12 banana 2012 0
I could reference df
twice like the following, but I'd like to find a way to not have to do that if possible.
df %>%
expand(type = c("apple", "orange", "pear", "banana"), year) %>%
left_join(df, by = c("type", "year")) %>%
mutate(val = replace_na(val, 0))
# A tibble: 12 x 3
type year val
<chr> <dbl> <int>
1 apple 2010 1
2 apple 2011 2
3 apple 2012 3
4 banana 2010 0
5 banana 2011 0
6 banana 2012 0
7 orange 2010 4
8 orange 2011 5
9 orange 2012 6
10 pear 2010 7
11 pear 2011 0
12 pear 2012 8
CodePudding user response:
Make type
a factor with banana
as a level, then complete will work as you expect:
library(dplyr)
library(tidyr)
df %>%
mutate(type = factor(type, levels = c(unique(type), "banana"))) %>%
complete(type, year, fill = list(val = 0))
# A tibble: 12 × 3
type year val
<fct> <dbl> <int>
1 apple 2010 1
2 apple 2011 2
3 apple 2012 3
4 orange 2010 4
5 orange 2011 5
6 orange 2012 6
7 pear 2010 7
8 pear 2011 0
9 pear 2012 8
10 banana 2010 0
11 banana 2011 0
12 banana 2012 0