I have a given dataset with 8 variables
Var1 <- c(1,0,1,0,1)
Var2 <- c(1,0,1,0,1)
Var3 <- c(1,1,1,0,1)
Var4 <- c(1,0,1,1,1)
Var5 <- c(1,0,0,0,1)
Var6 <- c(1,0,1,0,1)
Var7 <- c(1,1,1,0,1)
Var8 <- c(0,0,0,0,1)
DF <- data.frame(Var1, Var2, Var3, Var4, Var5, Var6, Var7, Var8)
DF
which results in:
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8
1 1 1 1 1 1 1 1 0
2 0 0 1 0 0 0 1 0
3 1 1 1 1 0 1 1 0
4 0 0 0 1 0 0 0 0
5 1 1 1 1 1 1 1 1
Each object represents a person, who participated in a study. And each person is capable of giving multiple answers. Person 1 for example has answered every single question, except question 8 (Var8 = 0) with "yes" (1). Person 2 only answered question 3 and 7 with "yes" etc..
I want to find the frequency distribution for every single combination of answers for the variables Var1 to Var6. In other words, how many people have answered only Var1 and Var2, how many answered Var1 and Var4, how many Var5 and Var6 and Var7 with a yes (1), and so on..
So far I tried:
DF %>%
filter(across(Var1:Var3) == 1) %>%
count(Var1, Var2, Var3)
for one of the variable combinations (Var1, Var2, Var3).
Is there a way to calculate this, other than going through every single combination by hand and select/filter/count them? Thanks in advance.
CodePudding user response:
We may use combn
v1 <- combn(DF, 2, FUN = function(x) sum(Reduce(`&`, x)))
names(v1) <- combn(names(DF), 2, FUN = paste, collapse="_")
-output
> v1
Var1_Var2 Var1_Var3 Var1_Var4 Var1_Var5 Var1_Var6 Var1_Var7 Var1_Var8 Var2_Var3 Var2_Var4 Var2_Var5 Var2_Var6 Var2_Var7 Var2_Var8 Var3_Var4 Var3_Var5
3 3 3 2 3 3 1 3 3 2 3 3 1 3 2
Var3_Var6 Var3_Var7 Var3_Var8 Var4_Var5 Var4_Var6 Var4_Var7 Var4_Var8 Var5_Var6 Var5_Var7 Var5_Var8 Var6_Var7 Var6_Var8 Var7_Var8
3 4 1 2 3 3 1 2 2 1 3 1 1
IF we need for 2 to 5 combinations, use lapply
lst1 <- lapply(2:5, function(i) {
v1 <- combn(DF, i, FUN = function(x) sum(Reduce(`&`, x)))
names(v1) <- combn(names(DF), i, FUN = paste, collapse="_")
v1
})
CodePudding user response:
Var1 <- c(1,0,1,0,1,1,1)
Var2 <- c(1,0,1,0,1,1,1)
Var3 <- c(1,1,1,0,1,1,1)
Var4 <- c(1,0,1,1,1,1,1)
Var5 <- c(1,0,0,0,1,1,1)
Var6 <- c(1,0,1,0,1,1,1)
Var7 <- c(1,1,1,0,1,1,1)
Var8 <- c(0,0,0,0,1,0,1)
DF <- data.frame(Var1, Var2, Var3, Var4, Var5, Var6, Var7, Var8)
library(dplyr)
DF %>%
group_by(Var1, Var2, Var3, Var4, Var5, Var6, Var7, Var8) %>%
count()