I have a table of comma separated codes that should be collapsed
group_codes <-
tibble::tribble(
~group_codes,
"AAA11, AAA1Z",
"ZZZ21, ZZZ2Z"
)
And a table on which the collapsing and summing should be run:
tibble::tribble(
~codes, ~values,
"AAA11", 10,
"AAA1Z", 20,
"CCC3", 34,
"ZZZ21", 10,
"ZZZ2Z", 30
)
The third row will stay intact, it is not in the list of codes to be collapsed.
The expected result is the collapse of codes in a list and sum their values.
tibble::tribble(
~codes, ~values,
"AAA11,AAA1Z", 30L,
"CCC3", 34L,
"ZZZ21, ZZZ2Z", 40L
)
I am expecting to run the collapse after a group-by. The codes will not repeat. They may in most cases follow each other but the order is unknown (data comes from a manual entry form. What would be best tidyverse approach?
CodePudding user response:
We may split the column in group_codes, do a join and summarise
library(dplyr)
library(tidyr)
group_codes %>%
mutate(codes = group_codes) %>%
separate_rows(codes) %>%
left_join(df1, ., by = "codes") %>%
transmute(codes = coalesce(group_codes, codes), values) %>%
group_by(codes) %>%
summarise(values = sum(values, na.rm = TRUE), .groups = 'drop')
-output
# A tibble: 3 × 2
codes values
<chr> <dbl>
1 AAA11, AAA1Z 30
2 CCC3 34
3 ZZZ21, ZZZ2Z 40