Is there a way to calculate the frequency of combinations from a set of binary variables?

a <- c(0,1,0,1,0)
b <- c(1,1,0,1,0)
c <- c(0,1,0,0,0)

> data.frame(a, b, c)
  a b c
1 0 1 0
2 1 1 1
3 0 0 0
4 1 1 0
5 0 0 0

In this example, combination a b is the most common, since obj. 2 and 4 have these. I only want to count combinations with at least 2 variables containing [1]. Is there a way to calculate this ? I would appreciate any thoughts or ideas!

My expected output should be like this:

combinations


1 ab  2
2 ac  1
3 bc  1
4 abc 1

CodePudding user response：

Try this:

> X <- data.frame(a, b, c)
> apply(model.matrix(data=X, ~a*b*c), 2, sum)[-(1:4)]

 a:b   a:c   b:c a:b:c 
    2     1     1     1

model.matrix will encode all of the interactions in your dataset for each row, then I used apply to sum the rows. The first four elements were the intercept and main effects a, b, and c which you didn't need.

CodePudding user response：

Maybe this could help

unlist(
  sapply(
    2:3,
    function(k) {
      setNames(
        combn(df, k, function(x) sum(Reduce("*", x))),
        combn(names(df), k, toString)
      )
    }
  )
)

which gives

   a, b    a, c    b, c a, b, c 
      2       1       1       1

data

df <- data.frame(a, b, c)

CodePudding user response：

Here is another option. First get all combinations of your data.frame names, then using lapply on each of the combinations, check the rowSums to see if that combination is present in the data.frame, and sum up all of those found.

res <- unlist(Map(combn, list(names(df)), 2:3, simplify = F), recursive = F)
unlist(lapply(res, function(x) {
  setNames(data.frame(sum(as.integer(rowSums(df[,x] == 1, na.rm = T) == length(x)))),
                      paste0(x, collapse = ''))
}), use.names = T)

Output

 ab  ac  bc abc 
  2   1   1   1