Home > database >  R dplyr: How to add extra rows to a df using group info and missing entries of a particular column f
R dplyr: How to add extra rows to a df using group info and missing entries of a particular column f

Time:07-12

I have:

group items value
grp1 A 1
grp1 B 2
grp2 B 3

I want:

group items value
grp1 A 1
grp1 B 2
grp1 C NA
grp2 A NA
grp2 B 3
grp2 C NA

"group" is taken from the input df. "items" is taken from a codelist vector with all possible entries, all other columns are filled in where known or else NA.

Example:

item_codelist <- c("A", "B", "C")

input <- data.frame("group" = c("grp1", "grp1", "grp2"), "items" = c("A", "B", "B"), "values" = c(1, 2, 3))

I looked into fill(), extend() and complete() but could not get any of these to work for this purpose.

Below is my current workaround but I find it somewhat complicated and I am using a for loop which will take forever for my 200 MB data frame...

If you know an easier way to do this (preferably in dplyr syntax) let me know. Thanks!


# create a data frame with all groups and items
codelist_df <- input %>% head(0) %>% select(group, items)
for (grp in unique(input$group)){
  df <- data.frame("items" = item_codelist) %>%
    mutate( group = grp, .before = 1)
  codelist_df <- bind_rows(codelist_df, df)
}

# join that data frame to the input data
output <- input %>%
  group_by(group) %>%
  full_join(codelist_df) %>%
  arrange(group, items)

CodePudding user response:

Stefan's comment is by far the best solution, which I was unaware of, but here's one option:

library(dplyr)
library(purrr)
library(tidyr)

input <- data.frame("group" = c("grp1", "grp1", "grp2"), "items" = c("A", "B", "B"), "values" = c(1, 2, 3))

items <- c("A", "B", "C") 

input %>% 
  split(.$group) %>% 
  map_df(~full_join(., as_tibble(items), by = c("items" = "value")) %>% 
           arrange(items)) %>% 
  fill(group, .direction = 'down')
#>   group items values
#> 1  grp1     A      1
#> 2  grp1     B      2
#> 3  grp1     C     NA
#> 4  grp1     A     NA
#> 5  grp2     B      3
#> 6  grp2     C     NA

CodePudding user response:

It seemse like you want to cross join the groups and items. To do that, you could use dplyr::full_join() with the argument by = character(), and then left join the values back in:

library(dplyr, warn.conflicts = FALSE)

item_codelist <- tibble(items = c('A', 'B', 'C'))

groups <- tibble(group = c('grp1', 'grp1', 'grp2'))

input <- tibble("group" = c("grp1", "grp1", "grp2"), "items" = c("A", "B", "B"), "values" = c(1, 2, 3))

item_codelist |> 
  full_join(groups, by = character()) |> 
  left_join(input, by = c('items', 'group')) |> 
  relocate(group) |> 
  arrange(group, items) |> 
  distinct()

#> # A tibble: 6 × 3
#>   group items values
#>   <chr> <chr>  <dbl>
#> 1 grp1  A          1
#> 2 grp1  B          2
#> 3 grp1  C         NA
#> 4 grp2  A         NA
#> 5 grp2  B          3
#> 6 grp2  C         NA

Created on 2022-07-11 by the reprex package (v2.0.1)

  • Related