Home > other >  Count 2 factors regardless of order
Count 2 factors regardless of order

Time:04-19

How can I perform pairwise counts on 2 factor columns, regardless of order. Both columns contain many identical elements.

dplyr's group_by() %>% count() or group_by() %>% tally() functions perform permutation counts.

Is there an option or method to perform combination counts instead?

Input dataframe:

Factor1 <- c('A','A','B','B','C','B','D')
Factor2 <- c('B','B','A','C','B','B','E')
DF <- data.frame(Factor1,Factor2)

Desired result:

CoFactors <- c('AB','BC','BB','DE')
n <- c(3,2,1,1)
Result <- data.frame(CoFactors,n)

CodePudding user response:

in Base R:

data.frame(table(apply(DF, 1, function(x)paste0(sort(x), collapse = ''))))
  Var1 Freq
1   AB    3
2   BB    1
3   BC    2
4   DE    1

or even:

DF %>%
  mutate(Factor = pmin(Factor1, Factor2), 
         Factor2 = pmax(Factor1, Factor2)) %>%
  group_by(Factor, Factor2) %>%
  count()

# A tibble: 4 x 3
# Groups:   Factor, Factor2 [4]
  Factor Factor2     n
  <chr>  <chr>   <int>
1 A      B           3
2 B      B           1
3 B      C           2
4 D      E           1
  • Related