I have a dataset with 6 variables:
Var1 <- c(1,0,1,0,1)
Var2 <- c(1,0,1,0,1)
Var3 <- c(1,1,1,0,1)
Var4 <- c(1,0,1,1,1)
Var5 <- c(1,0,0,0,1)
Var6 <- c(1,0,1,0,1)
DF <- data.frame(Var1, Var2, Var3, Var4, Var5, Var6)
DF
which results in
Var1 Var2 Var3 Var4 Var5 Var6
1 1 1 1 1 1 1
2 0 0 1 0 0 0
3 1 1 1 1 0 1
4 0 0 0 1 0 0
5 1 1 1 1 1 1
I want to find all the possible variable-combinations, like how many 2 variable combinations (eg Var1Var2, Var2Var4, Var5Var4, etc...), 3 variable combinations, 4 ... etc. do I have. Is there a way to calculate this?
Thanks.
CodePudding user response:
Try this
> choose(length(DF), 2:length(DF))
[1] 15 20 15 6 1
or
> lapply(
2:length(DF),
combn,
x = names(DF)
)
[[1]]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] "Var1" "Var1" "Var1" "Var1" "Var1" "Var2" "Var2" "Var2" "Var2" "Var3"
[2,] "Var2" "Var3" "Var4" "Var5" "Var6" "Var3" "Var4" "Var5" "Var6" "Var4"
[,11] [,12] [,13] [,14] [,15]
[1,] "Var3" "Var3" "Var4" "Var4" "Var5"
[2,] "Var5" "Var6" "Var5" "Var6" "Var6"
[[2]]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] "Var1" "Var1" "Var1" "Var1" "Var1" "Var1" "Var1" "Var1" "Var1" "Var1"
[2,] "Var2" "Var2" "Var2" "Var2" "Var3" "Var3" "Var3" "Var4" "Var4" "Var5"
[3,] "Var3" "Var4" "Var5" "Var6" "Var4" "Var5" "Var6" "Var5" "Var6" "Var6"
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
[1,] "Var2" "Var2" "Var2" "Var2" "Var2" "Var2" "Var3" "Var3" "Var3" "Var4"
[2,] "Var3" "Var3" "Var3" "Var4" "Var4" "Var5" "Var4" "Var4" "Var5" "Var5"
[3,] "Var4" "Var5" "Var6" "Var5" "Var6" "Var6" "Var5" "Var6" "Var6" "Var6"
[[3]]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] "Var1" "Var1" "Var1" "Var1" "Var1" "Var1" "Var1" "Var1" "Var1" "Var1"
[2,] "Var2" "Var2" "Var2" "Var2" "Var2" "Var2" "Var3" "Var3" "Var3" "Var4"
[3,] "Var3" "Var3" "Var3" "Var4" "Var4" "Var5" "Var4" "Var4" "Var5" "Var5"
[4,] "Var4" "Var5" "Var6" "Var5" "Var6" "Var6" "Var5" "Var6" "Var6" "Var6"
[,11] [,12] [,13] [,14] [,15]
[1,] "Var2" "Var2" "Var2" "Var2" "Var3"
[2,] "Var3" "Var3" "Var3" "Var4" "Var4"
[3,] "Var4" "Var4" "Var5" "Var5" "Var5"
[4,] "Var5" "Var6" "Var6" "Var6" "Var6"
[[4]]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "Var1" "Var1" "Var1" "Var1" "Var1" "Var2"
[2,] "Var2" "Var2" "Var2" "Var2" "Var3" "Var3"
[3,] "Var3" "Var3" "Var3" "Var4" "Var4" "Var4"
[4,] "Var4" "Var4" "Var5" "Var5" "Var5" "Var5"
[5,] "Var5" "Var6" "Var6" "Var6" "Var6" "Var6"
[[5]]
[,1]
[1,] "Var1"
[2,] "Var2"
[3,] "Var3"
[4,] "Var4"
[5,] "Var5"
[6,] "Var6"
CodePudding user response:
Well, as in your case all variables are binary, the number of possible combinations given k number of variables is just:
To calculate the number of combinations also for non-binary variables, you can use the function expand.grid and then count the number of rows. As you probably don't want to double count combinations, only count unique rows. Here is an easy example:
> library(dplyr)
> var1 <- c(1,2,2,3,5)
> var2 <- c(1,1,1,2,3)
> expand.grid(var1, var2) %>% unique %>% nrow
[1] 12