I have this example dataset
df <- data.frame(subjects = 1:12,
Why_are_you_not_happy =
c(1,2,"1,2,5",5,1,2,"3,4",3,2,"1,5",3,4),
why_are_you_sad =
c("1,2,3",1,2,3,"4,5,3",2,1,4,3,1,1,1) )
And would like to convert it into a dummy variables format (based on the 5 answers of each question). Can someone guide me through an effective way ? thanks.
CodePudding user response:
You can separate_rows
for multiple choices, convert to dummy and summarise
by subjects
(to get one row per subjects, with all their choices).
library(fastDummies)
library(tidyr)
library(dplyr)
df %>%
separate_rows(Why_are_you_not_happy, why_are_you_sad) %>%
dummy_cols(c("Why_are_you_not_happy", "why_are_you_sad"),
remove_selected_columns = TRUE) %>%
group_by(subjects) %>%
summarise(across(everything(), max))
output
# A tibble: 12 × 11
subjects Why_are_you…¹ Why_a…² Why_a…³ Why_a…⁴ Why_a…⁵ why_a…⁶ why_a…⁷ why_a…⁸ why_a…⁹ why_a…˟
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 0 0 0 0 1 1 1 0 0
2 2 0 1 0 0 0 1 0 0 0 0
3 3 1 1 0 0 1 0 1 0 0 0
4 4 0 0 0 0 1 0 0 1 0 0
5 5 1 0 0 0 0 0 0 1 1 1
6 6 0 1 0 0 0 0 1 0 0 0
7 7 0 0 1 1 0 1 0 0 0 0
8 8 0 0 1 0 0 0 0 0 1 0
9 9 0 1 0 0 0 0 0 1 0 0
10 10 1 0 0 0 1 1 0 0 0 0
11 11 0 0 1 0 0 1 0 0 0 0
12 12 0 0 0 1 0 1 0 0 0 0