I have a multiple-choice-question with seven possible answers, my data looks like this:
Q12 | |
---|---|
1 | Inhalt, Ermöglichen Koalition |
2 | Inhalt, Ermöglichen Koalition, Verhindern Kanzlerschaft |
3 | Inhalt |
4 | Spitzenpolitiker |
My goal is to -> seperate the observations and create a binary matrix with seven variables ("Inhalt", "Arbeit", "Verhindern Koalition", "Ermöglichen Koalition", "Verhindern Kanzlerschaft", "Ermöglichen Kanzlerschaft", "Spitzenpolitiker") akin to this:
Inhalt | Ermöglichen Koalition | Verhindern Kanzlerschaft | Spitzenpolitiker | |
---|---|---|---|---|
1 | 1 | 1 | 0 | 0 |
2 | 1 | 1 | 1 | 0 |
3 | 1 | 0 | 0 | 0 |
4 | 0 | 0 | 0 | 1 |
I have tried einzeln_strategisch_2021 <- data.frame(strategisch_2021[, ! colnames (strategisch_2021) %in% "Q12"], model.matrix(~ Q12 - 1, strategisch_2021)) %>%
This gives me the matrix I want but it does not separate the observations, so now I have a matrix with 20 variables instead of the seven
also tried seperate() like this separate(Q12, into = c("Inhalt", "Arbeit", "Verhindern Koalition", "Ermöglichen Koalition", "Verhindern Kanzlerschaft", "Ermöglichen Kanzlerschaft", "Spitzenpolitiker"), ",") %>%
This does separate the observations, but not in the right order and without the matrix.
I also tried to use splitstackshape and the charMat-function, but I cant get that to work at all (judging from the description "Create a Binary Matrix from a List of Character Values") it should do exactly what I want.
Any help would be greatly appreciated as I have been stuck for days now ;)
CodePudding user response:
Another possible solution:
library(tidyverse)
df %>%
mutate(id = row_number()) %>%
separate_rows(Q12, sep = ", ") %>%
count(id, Q12) %>%
pivot_wider(id, names_from = Q12, values_from = n, values_fill = 0,
values_fn = length) %>%
bind_cols(Q12 = df$Q12, .) %>%
relocate(Q12, .after = "id")
#> # A tibble: 4 × 6
#> id Q12 `Ermöglichen K…` Inhalt `Verhindern Ka…` Spitzenpolitiker
#> <int> <chr> <int> <int> <int> <int>
#> 1 1 Inhalt, Ermög… 1 1 0 0
#> 2 2 Inhalt, Ermög… 1 1 1 0
#> 3 3 Inhalt 0 1 0 0
#> 4 4 Spitzenpoliti… 0 0 0 1
Or shorter:
library(tidyverse)
bind_rows(apply(df, 1, \(x) strsplit(x, ", ") %>% table), .id = "id") %>%
mutate(across(-id, ~ as.integer(.x) %>% pmin(1) %>% replace_na(0)))
#> # A tibble: 4 × 5
#> id `Ermöglichen Koalition` Inhalt `Verhindern Kanzlersch…` Spitzenpolitiker
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 1 0 0
#> 2 2 1 1 1 0
#> 3 3 0 1 0 0
#> 4 4 0 0 0 1
CodePudding user response:
Try this:
## Creating the dataframe
df = data.frame(Options = c("1","2","3","4","5","6","7"),
Ques12 =c("Alpha, Beta, Gamma", "Lima, Peru", "Beta", "YOLO, Gamma", "Victor", "Beta, Gamma", "YOLO, Peru"))
## Separating by delimiter and inserting new rows
df = df %>%
mutate(Ques12 = strsplit(as.character(Ques12), ",")) %>%
unnest(Ques12)
## Removing quotation marks
df$Ques12 = noquote(df$Ques12)
## Removing all white spaces
df$Ques12 = gsub(" ", "", df$Ques12, fixed = TRUE)
## Longer to wider format
df1 = df %>%
group_by(Options,Ques12) %>%
summarise(freq = n()) %>%
pivot_wider(names_from = Ques12, values_from = freq) %>%
replace(is.na(.), 0)
Output
> df
Options Ques12
1 1 Alpha, Beta, Gamma
2 2 Lima, Peru
3 3 Beta
4 4 YOLO, Gamma
5 5 Victor
6 6 Beta, Gamma
7 7 YOLO, Peru
> df1
# A tibble: 7 x 8
# Groups: Options [7]
Options Alpha Beta Gamma Lima Peru YOLO Victor
<chr> <int> <int> <int> <int> <int> <int> <int>
1 1 1 1 1 0 0 0 0
2 2 0 0 0 1 1 0 0
3 3 0 1 0 0 0 0 0
4 4 0 0 1 0 0 1 0
5 5 0 0 0 0 0 0 1
6 6 0 1 1 0 0 0 0
7 7 0 0 0 0 1 1 0
CodePudding user response:
We may use dummy_cols
from fastDummies
library(fastDummies)
library(stringr)
library(dplyr)
dummy_cols(df1, "Q12", split=",\\s*") %>%
setNames(., str_remove(names(.), "Q12_"))
-output
Q12 Inhalt Ermöglichen Koalition Verhindern Kanzlerschaft Spitzenpolitiker
1 Inhalt, Ermöglichen Koalition 1 1 0 0
2 Inhalt, Ermöglichen Koalition, Verhindern Kanzlerschaft 1 1 1 0
3 Inhalt 1 0 0 0
4 Spitzenpolitiker 0 0 0 1
It can also work with multiple columns
df1$Q13 <- df1$Q12
dummy_cols(df1, c("Q12", "Q13"), split=",\\s*",
remove_selected_columns = TRUE) %>%
setNames(., str_remove(names(.), "Q\\d _"))
-output
Inhalt Ermöglichen Koalition Verhindern Kanzlerschaft Spitzenpolitiker Inhalt Ermöglichen Koalition Verhindern Kanzlerschaft
1 1 1 0 0 1 1 0
2 1 1 1 0 1 1 1
3 1 0 0 0 1 0 0
4 0 0 0 1 0 0 0
Spitzenpolitiker
1 0
2 0
3 0
4 1
data
df1 <- structure(list(Q12 = c("Inhalt, Ermöglichen Koalition",
"Inhalt, Ermöglichen Koalition, Verhindern Kanzlerschaft",
"Inhalt", "Spitzenpolitiker")), class = "data.frame", row.names = c("1",
"2", "3", "4"))