Let's say we have this question Why are you not happy? and we have 5 answers (1, 2, 3, 4, 5)
s = data.frame(subjects = 1:12,
Why_are_you_not_happy = c(1,2,4,5,1,2,4,3,2,1,3,4))
in the previous example every subject picked only one option. but let's say that each of the subjects 3, 7 and 10 picked more than one option.
- subject 3 : options 1,2,5
- subject 7 : option 3,4
- subject 10 : option 1,5
I want to code the options of this question considering these multiple options for these 3 subjects, while preserving the shape of the dataframe.
The next case is if the dataframe includes 2 questions as follows :
df <- data.frame(subjects = 1:12,
Why_are_you_not_happy =
c(1,2,"1,2,5",5,1,2,"3,4",3,2,"1,5",3,4),
why_are_you_sad =
c("1,2,3",1,2,3,"4,5,3",2,1,4,3,1,1,1) )
How can we making the proper coding for the first and second scenario ? The objective is to apply multiple correspondence analysis (MCA).
Thank you
CodePudding user response:
I may have misunderstood, but it sounds like you want the separate()
function from the tidyr package, e.g.
library(tidyr)
df <- data.frame(subjects = 1:12,
Why_are_you_not_happy = c(1,2,"1,2,5",5,1,2,"3,4",3,2,"1,5",3,4))
df
#> subjects Why_are_you_not_happy
#> 1 1 1
#> 2 2 2
#> 3 3 1,2,5
#> 4 4 5
#> 5 5 1
#> 6 6 2
#> 7 7 3,4
#> 8 8 3
#> 9 9 2
#> 10 10 1,5
#> 11 11 3
#> 12 12 4
df %>%
separate(Why_are_you_not_happy,
sep = ",", into = c("Answer_1",
"Answer_2",
"Answer_3"))
#> Warning: Expected 3 pieces. Missing pieces filled with `NA` in 11 rows [1, 2, 4,
#> 5, 6, 7, 8, 9, 10, 11, 12].
#> subjects Answer_1 Answer_2 Answer_3
#> 1 1 1 <NA> <NA>
#> 2 2 2 <NA> <NA>
#> 3 3 1 2 5
#> 4 4 5 <NA> <NA>
#> 5 5 1 <NA> <NA>
#> 6 6 2 <NA> <NA>
#> 7 7 3 4 <NA>
#> 8 8 3 <NA> <NA>
#> 9 9 2 <NA> <NA>
#> 10 10 1 5 <NA>
#> 11 11 3 <NA> <NA>
#> 12 12 4 <NA> <NA>
Or, perhaps in long format? E.g.
df %>%
separate(Why_are_you_not_happy,
sep = ",", into = c("Answer_1",
"Answer_2",
"Answer_3")) %>%
pivot_longer(-subjects) %>%
na.omit()
#> Warning: Expected 3 pieces. Missing pieces filled with `NA` in 11 rows [1, 2, 4,
#> 5, 6, 7, 8, 9, 10, 11, 12].
#> # A tibble: 16 × 3
#> subjects name value
#> <int> <chr> <chr>
#> 1 1 Answer_1 1
#> 2 2 Answer_1 2
#> 3 3 Answer_1 1
#> 4 3 Answer_2 2
#> 5 3 Answer_3 5
#> 6 4 Answer_1 5
#> 7 5 Answer_1 1
#> 8 6 Answer_1 2
#> 9 7 Answer_1 3
#> 10 7 Answer_2 4
#> 11 8 Answer_1 3
#> 12 9 Answer_1 2
#> 13 10 Answer_1 1
#> 14 10 Answer_2 5
#> 15 11 Answer_1 3
#> 16 12 Answer_1 4
Created on 2022-10-05 by the reprex package (v2.0.1)
Does this solve your problem?