Coding a multiple responses question using RSTUDIO-CodePudding

Let's say we have this question Why are you not happy? and we have 5 answers (1, 2, 3, 4, 5)

s = data.frame(subjects = 1:12,
  Why_are_you_not_happy = c(1,2,4,5,1,2,4,3,2,1,3,4))

in the previous example every subject picked only one option. but let's say that each of the subjects 3, 7 and 10 picked more than one option.

subject 3 : options 1,2,5
subject 7 : option 3,4
subject 10 : option 1,5

I want to code the options of this question considering these multiple options for these 3 subjects, while preserving the shape of the dataframe.

The next case is if the dataframe includes 2 questions as follows :

df <- data.frame(subjects = 1:12,
                 Why_are_you_not_happy = 
                   c(1,2,"1,2,5",5,1,2,"3,4",3,2,"1,5",3,4),
                 why_are_you_sad = 
                   c("1,2,3",1,2,3,"4,5,3",2,1,4,3,1,1,1) )

How can we making the proper coding for the first and second scenario ? The objective is to apply multiple correspondence analysis (MCA).

Thank you

CodePudding user response：

I may have misunderstood, but it sounds like you want the separate() function from the tidyr package, e.g.

library(tidyr)

df <- data.frame(subjects = 1:12,
                 Why_are_you_not_happy = c(1,2,"1,2,5",5,1,2,"3,4",3,2,"1,5",3,4))
df
#>    subjects Why_are_you_not_happy
#> 1         1                     1
#> 2         2                     2
#> 3         3                 1,2,5
#> 4         4                     5
#> 5         5                     1
#> 6         6                     2
#> 7         7                   3,4
#> 8         8                     3
#> 9         9                     2
#> 10       10                   1,5
#> 11       11                     3
#> 12       12                     4

df %>%
  separate(Why_are_you_not_happy,
           sep = ",", into = c("Answer_1",
                               "Answer_2",
                               "Answer_3"))
#> Warning: Expected 3 pieces. Missing pieces filled with `NA` in 11 rows [1, 2, 4,
#> 5, 6, 7, 8, 9, 10, 11, 12].
#>    subjects Answer_1 Answer_2 Answer_3
#> 1         1        1     <NA>     <NA>
#> 2         2        2     <NA>     <NA>
#> 3         3        1        2        5
#> 4         4        5     <NA>     <NA>
#> 5         5        1     <NA>     <NA>
#> 6         6        2     <NA>     <NA>
#> 7         7        3        4     <NA>
#> 8         8        3     <NA>     <NA>
#> 9         9        2     <NA>     <NA>
#> 10       10        1        5     <NA>
#> 11       11        3     <NA>     <NA>
#> 12       12        4     <NA>     <NA>

Or, perhaps in long format? E.g.

df %>%
  separate(Why_are_you_not_happy,
           sep = ",", into = c("Answer_1",
                               "Answer_2",
                               "Answer_3")) %>%
  pivot_longer(-subjects) %>%
  na.omit()
#> Warning: Expected 3 pieces. Missing pieces filled with `NA` in 11 rows [1, 2, 4,
#> 5, 6, 7, 8, 9, 10, 11, 12].
#> # A tibble: 16 × 3
#>    subjects name     value
#>       <int> <chr>    <chr>
#>  1        1 Answer_1 1    
#>  2        2 Answer_1 2    
#>  3        3 Answer_1 1    
#>  4        3 Answer_2 2    
#>  5        3 Answer_3 5    
#>  6        4 Answer_1 5    
#>  7        5 Answer_1 1    
#>  8        6 Answer_1 2    
#>  9        7 Answer_1 3    
#> 10        7 Answer_2 4    
#> 11        8 Answer_1 3    
#> 12        9 Answer_1 2    
#> 13       10 Answer_1 1    
#> 14       10 Answer_2 5    
#> 15       11 Answer_1 3    
#> 16       12 Answer_1 4

^{Created on 2022-10-05 by the reprex package (v2.0.1)}

Does this solve your problem?