Home > Mobile >  Binary Variables Combinations Analysis in R
Binary Variables Combinations Analysis in R

Time:10-11

I have a data set, which has a lot of binary variables. For the ease of illustration, here is a smaller version with only 4 variables:

set.seed(5)
my_data<-data.frame("Slept Well"=sample(c(0,1),10,TRUE),
                    "Had Breakfast"=sample(c(0,1),10,TRUE),
                    "Worked out"=sample(c(0,1),10,TRUE),
                    "Meditated"=sample(c(0,1),10,TRUE))

In the above, each row corresponds to an observation. I am interested in analysing the frequency of each unique combination of the variables. For example, how many observations said that they both slept well and meditated, but did not have breakfast or worked out?

I would like to be able to rank the unique combinations from most frequently occurring to the least frequently occurring. What is the best way to go about coding that up?

CodePudding user response:

What about a dplyr solution:

library(dplyr)
   
    my_data %>%
  # group it
  group_by_all() %>%
  # frequencies
  summarise(freq = n()) %>%
  # order decreasing
  arrange(-freq)

# A tibble: 9 x 5
  Slept.Well Had.Breakfast Worked.out Meditated  freq
  <chr>      <chr>         <chr>      <chr>     <int>
1 0          1             1          0             2
2 0          0             0          0             1
3 0          0             0          1             1
4 0          0             1          0             1
5 0          0             1          1             1
6 0          1             0          1             1
7 0          1             1          1             1
8 1          0             0          1             1
9 1          1             0          0             1

Or with data.table:

res <- setorder(data.table(my_data)[,"."(freq = .N), by = names(my_data)],-freq)
res
   Slept.Well Had.Breakfast Worked.out Meditated freq
1:          0             1          1         0    2
2:          1             0          0         1    1
3:          0             0          1         0    1
4:          0             0          0         0    1
5:          0             1          0         1    1
6:          0             1          1         1    1
7:          0             0          1         1    1
8:          0             0          0         1    1
9:          1             1          0         0    1

CodePudding user response:

You can use aggregate.

x <- aggregate(list(n=rep(1, nrow(my_data))), my_data, length)
#x <- aggregate(list(n=my_data[,1]), my_data, length) #Alternative
x[order(-x$n),]
#  Slept.Well Had.Breakfast Worked.out Meditated n
#4          0             1          1         0 2
#1          0             0          0         0 1
#2          1             1          0         0 1
#3          0             0          1         0 1
#5          0             0          0         1 1
#6          1             0          0         1 1
#7          0             1          0         1 1
#8          0             0          1         1 1
#9          0             1          1         1 1
  • Related