Home > Software engineering >  Calculate combinations of several categorical variables in R
Calculate combinations of several categorical variables in R

Time:05-18

I have a data frame with mainly categorical variables. I want to see the number of combinations of variables found in three of these columns with categorical variables. The data in the columns looks like this:

number_arms <- c("6","8","12")
arrangements <- c("single", "paired", "ornament")
approx_position <- c("top", "middle", "bottom")
rg2 <- data.frame(number_arms, arrangements, approx_position)

I was reading in another post to use the following code when comparing two columns:

library(dplyr)
library(stringr)
rg2 %>%
     count(combination = str_c(pmin(number_arms, arrangements), ' - ',
       pmax(number_arms, arrangements)), name = "count") 

This is the result:

combination   count
12 - single    1            
16 - single    1            
4 - paired     3            
4 - single     4            
5 - paired     4            
5 - single     2            
6 - ornament   1            
6 - paired    81    

However, the code does not give me the wanted results if I add the third column, like this:

rg2 %>%
     count(combination = str_c(pmin(number_arms, arrangements, approx_position), ' - ',
       pmax(number_arms, arrangements, approx_position)), name = "count") 

It still runs the code without error but I get wrong results. Do I need a different code to calculate the combinations of three variables?

CodePudding user response:

If you're looking for the count of each combination of the variables, excluding 0, you can do:

subset(data.frame(table(rg2)), Freq > 0)

   number_arms arrangements approx_position Freq
1           12     ornament          bottom    1
15           8       paired          middle    1
26           6       single             top    1

data

number_arms <- c("6","8","12")
arrangements <- c("single", "paired", "ornament")
approx_position <- c("top", "middle", "bottom")
rg2 <- data.frame(number_arms, arrangements, approx_position)

CodePudding user response:

Tidyverse option:

library(dplyr)

rg2 %>%
  group_by(number_arms, arrangements, approx_position) %>% 
  count()

Result:

 number_arms arrangements approx_position     n
  <chr>       <chr>        <chr>           <int>
1 12          ornament     bottom              1
2 6           single       top                 1
3 8           paired       middle              1

CodePudding user response:

You can try count() unite():

library(dplyr)
library(tidyr)

rg2 %>%
  count(number_arms, arrangements, approx_position, name = "count") %>%
  unite(combination, -count, sep = " - ")

#              combination count
# 1 12 - ornament - bottom     1
# 2       6 - single - top     1
# 3    8 - paired - middle     1
  • Related