Home > front end >  How to table two variables which contain listed observations?
How to table two variables which contain listed observations?

Time:09-23

Two of the variables in the df I am working with may contain multiple values per observation. I want to table the frequencies of these variables, but can't use table() on type 'list'... I've created a sample df below:

col_a <- c("a", "b", "c", "a,b", "b,c")
col_b <- c("c", "b", "a", "a,a", "a,c")
df <- data.frame(col_a, col_b)
df <- df %>% 
  mutate(col_a = strsplit(df$col_a, ","),
         col_b = strsplit(df$col_b, ",")
         )

This outputs:

         col_a        col_b
1            a            c
2            b            b
3            c            a
4  c("a", "b")  c("a", "a")
5  c("b", "c")  c("a", "c")

Now, table(df$col_a, df$col_b) returns Error in order(y) : unimplemented type 'list' in 'orderVector1'. In order to table the variables, I want to unlist the concatenated observations so that it looks like this:

  col_a col_b
1     a     c
2     b     b
3     c     a
4     a     a
5     a     a
6     b     a
7     b     a
8     b     a
9     b     c
10    c     a
11    c     c

Any ideas on how to accomplish this?

CodePudding user response:

We may use separate_rows on the original data

library(tidyr)
library(dplyr)
df %>% 
   separate_rows(col_a) %>%
   separate_rows(col_b)

-output

# A tibble: 11 × 2
   col_a col_b
   <chr> <chr>
 1 a     c    
 2 b     b    
 3 c     a    
 4 a     a    
 5 a     a    
 6 b     a    
 7 b     a    
 8 b     a    
 9 b     c    
10 c     a    
11 c     c    
  •  Tags:  
  • r
  • Related