Create combinations by group and sum-CodePudding

I have data of names within an ID number along with a number of associated values. It looks something like this:

structure(list(id = c("a", "a", "b", "b"), name = c("bob", "jane", 
"mark", "brittney"), number = c(1L, 2L, 1L, 2L), value = c(1L, 
2L, 1L, 2L)), class = "data.frame", row.names = c(NA, -4L))

#   id     name number value
# 1  a      bob      1     1
# 2  a     jane      2     2
# 3  b     mark      1     1
# 4  b brittney      2     2

I would like to create all the combinations of name, regardless of how many there are, and paste them together separated with commas, and sum their number and value within each id. The desired output from the example above is then:

structure(list(id = c("a", "a", "a", "b", "b", "b"), name = c("bob", 
"jane", "bob, jane", "mark", "brittney", "mark, brittney"), number = c(1L, 
2L, 3L, 1L, 2L, 3L), value = c(1L, 2L, 3L, 1L, 2L, 3L)), class = "data.frame", row.names = c(NA, -6L))

#   id           name number value
# 1  a            bob      1     1
# 2  a           jane      2     2
# 3  a      bob, jane      3     3
# 4  b           mark      1     1
# 5  b       brittney      2     2
# 6  b mark, brittney      3     3

Thanks all!

CodePudding user response：

You could use group_modify() add_row():

library(dplyr)

df %>%
  group_by(id) %>%
  group_modify( ~ .x %>%
    summarise(name = toString(name), across(c(number, value), sum)) %>%
    add_row(.x, .)
  ) %>%
  ungroup()

# # A tibble: 6 × 4
#   id    name           number value
#   <chr> <chr>           <int> <int>
# 1 a     bob                 1     1
# 2 a     jane                2     2
# 3 a     bob, jane           3     3
# 4 b     mark                1     1
# 5 b     brittney            2     2
# 6 b     mark, brittney      3     3

CodePudding user response：

You can create pairwise indices using combn() and expand the data frame with these using slice(). Then just group by these row pairs and summarise. I'm assuming you want pairwise combinations but this can be adapted for larger sets if needed. Some code to handle groups < 2 is included but can be removed if these don't exist in your data.

library(dplyr)

df1 %>%
  group_by(id) %>%
  slice(c(combn(seq(n()), min(n(), 2)))) %>%
  mutate(id2 = (row_number()-1) %/% 2) %>%
  group_by(id, id2) %>%
  summarise(name = toString(name),
            across(where(is.numeric), sum), .groups = "drop") %>%
  select(-id2) %>%
  bind_rows(df1 %>%
              group_by(id) %>%
              filter(n() > 1), .) %>%
  arrange(id) %>%
  ungroup()

# A tibble: 6 × 4
  id    name           number value
  <chr> <chr>           <int> <int>
1 a     bob                 1     1
2 a     jane                2     2
3 a     bob, jane           3     3
4 b     mark                1     1
5 b     brittney            2     2
6 b     mark, brittney      3     3