Home > OS >  count number of combinations by group
count number of combinations by group

Time:02-19

I am struggling to count the number of unique combinations in my data. I would like to first group them by the id and then count, how many times combination of each values occurs. here, it does not matter if the elements are combined in 'd-f or f-d, they still belongs in teh same category, as they have same element:

combinations: 

       n
c-f:   2   # aslo f-c
c-d-f: 1   # also cfd or fdc
d-f:   2   # also f-d or d-f. The dash is only for isualization purposes  

Dummy example:

# my data
dd <- data.frame(id = c(1,1,2,2,2,3,3,4, 4, 5,5),
             cat = c('c','f','c','d','f','c','f', 'd', 'f', 'f', 'd'))



> dd
  id cat
1  1   c
2  1   f
3  2   c
4  2   d
5  2   f
6  3   c
7  3   f
8  4   d
9  4   f
10  5   f
11  5   d

Using paste is a great solution provided by @benson23, but it considers as unique category f-d and d-f. I wish, however, that the order will not matter. Thank you!

CodePudding user response:

Create a "combination" column in summarise, we can count this column afterwards.

An easy way to count the category is to order them at the beginning, then in this case they will all be in the same order.

library(dplyr)

dd %>% 
  group_by(id) %>% 
  arrange(id, cat) %>% 
  summarize(combination = paste0(cat, collapse = "-"), .groups = "drop") %>% 
  count(combination)

# A tibble: 3 x 2
  combination     n
  <chr>       <int>
1 c-d-f           1
2 c-f             2
3 d-f             2
  • Related