I have:
group | items | value |
---|---|---|
grp1 | A | 1 |
grp1 | B | 2 |
grp2 | B | 3 |
I want:
group | items | value |
---|---|---|
grp1 | A | 1 |
grp1 | B | 2 |
grp1 | C | NA |
grp2 | A | NA |
grp2 | B | 3 |
grp2 | C | NA |
"group" is taken from the input df. "items" is taken from a codelist vector with all possible entries, all other columns are filled in where known or else NA.
Example:
item_codelist <- c("A", "B", "C")
input <- data.frame("group" = c("grp1", "grp1", "grp2"), "items" = c("A", "B", "B"), "values" = c(1, 2, 3))
I looked into fill(), extend() and complete() but could not get any of these to work for this purpose.
Below is my current workaround but I find it somewhat complicated and I am using a for loop which will take forever for my 200 MB data frame...
If you know an easier way to do this (preferably in dplyr syntax) let me know. Thanks!
# create a data frame with all groups and items
codelist_df <- input %>% head(0) %>% select(group, items)
for (grp in unique(input$group)){
df <- data.frame("items" = item_codelist) %>%
mutate( group = grp, .before = 1)
codelist_df <- bind_rows(codelist_df, df)
}
# join that data frame to the input data
output <- input %>%
group_by(group) %>%
full_join(codelist_df) %>%
arrange(group, items)
CodePudding user response:
Stefan's comment is by far the best solution, which I was unaware of, but here's one option:
library(dplyr)
library(purrr)
library(tidyr)
input <- data.frame("group" = c("grp1", "grp1", "grp2"), "items" = c("A", "B", "B"), "values" = c(1, 2, 3))
items <- c("A", "B", "C")
input %>%
split(.$group) %>%
map_df(~full_join(., as_tibble(items), by = c("items" = "value")) %>%
arrange(items)) %>%
fill(group, .direction = 'down')
#> group items values
#> 1 grp1 A 1
#> 2 grp1 B 2
#> 3 grp1 C NA
#> 4 grp1 A NA
#> 5 grp2 B 3
#> 6 grp2 C NA
CodePudding user response:
It seemse like you want to cross join the groups and items. To do that, you could use dplyr::full_join()
with the argument by = character()
, and then left join the values back in:
library(dplyr, warn.conflicts = FALSE)
item_codelist <- tibble(items = c('A', 'B', 'C'))
groups <- tibble(group = c('grp1', 'grp1', 'grp2'))
input <- tibble("group" = c("grp1", "grp1", "grp2"), "items" = c("A", "B", "B"), "values" = c(1, 2, 3))
item_codelist |>
full_join(groups, by = character()) |>
left_join(input, by = c('items', 'group')) |>
relocate(group) |>
arrange(group, items) |>
distinct()
#> # A tibble: 6 × 3
#> group items values
#> <chr> <chr> <dbl>
#> 1 grp1 A 1
#> 2 grp1 B 2
#> 3 grp1 C NA
#> 4 grp2 A NA
#> 5 grp2 B 3
#> 6 grp2 C NA
Created on 2022-07-11 by the reprex package (v2.0.1)