How can I perform pairwise counts on 2 factor columns, regardless of order. Both columns contain many identical elements.
dplyr's group_by() %>% count()
or group_by() %>% tally()
functions perform permutation counts.
Is there an option or method to perform combination counts instead?
Input dataframe:
Factor1 <- c('A','A','B','B','C','B','D')
Factor2 <- c('B','B','A','C','B','B','E')
DF <- data.frame(Factor1,Factor2)
Desired result:
CoFactors <- c('AB','BC','BB','DE')
n <- c(3,2,1,1)
Result <- data.frame(CoFactors,n)
CodePudding user response:
in Base R:
data.frame(table(apply(DF, 1, function(x)paste0(sort(x), collapse = ''))))
Var1 Freq
1 AB 3
2 BB 1
3 BC 2
4 DE 1
or even:
DF %>%
mutate(Factor = pmin(Factor1, Factor2),
Factor2 = pmax(Factor1, Factor2)) %>%
group_by(Factor, Factor2) %>%
count()
# A tibble: 4 x 3
# Groups: Factor, Factor2 [4]
Factor Factor2 n
<chr> <chr> <int>
1 A B 3
2 B B 1
3 B C 2
4 D E 1