I am new to stack overflow and couldn't find an answer to my question. I would really appreciate any input!
I have a dataset that I am tidying. I want to recode a variable with 202 columns into a table with binary values. It is from a "check all that apply" survey. The output from the survey looks like this:
Participant Language
1 'English|French'
2 'English'
3 'Spanish|French'
4 'English|Spanish'
5 'French'
The variables output has languages separated by '|' so can't do a table here. I'm wanting the result to look like this:
Participant | English | French | Spanish |
---|---|---|---|
1 | 1 | 1 | 0 |
2 | 1 | 0 | 0 |
3 | 0 | 1 | 1 |
4 | 1 | 0 | 1 |
5 | 0 | 1 | 0 |
I'm not sure how to do this without using '''ifelse''' and creating '''or''' arguments for each possible combination of languages. I would really appreciate any tips!
Note: the actual dataset is not focused on language, but the format is the same. There are far more than 3 choices so I am hoping to find an efficient way to do this
CodePudding user response:
With tidyverse
you could try the following. With separate_rows
you can add rows for each language. Then, add a temporary column to indicate 1 when language is present for the participant. Finally, pivot_wider
would put result into the desired format.
library(tidyverse)
df %>%
separate_rows(Language) %>%
mutate(Present = 1) %>%
pivot_wider(id_cols = Participant,
names_from = Language,
values_from = Present,
values_fill = 0)
Output
Participant English French Spanish
<int> <dbl> <dbl> <dbl>
1 1 1 1 0
2 2 1 0 0
3 3 0 1 1
4 4 1 0 1
5 5 0 1 0
CodePudding user response:
You will need to define the columns you want first. You can either do this manually:
cols <- c("English", "French", "Spanish")
Or automated:
cols <- unique(unlist(strsplit(df$Language, "\\|")))
cols
#> [1] "English" "French" "Spanish"
In either case, your result can be obtained like this:
cbind(df[1], setNames(as.data.frame(lapply(cols, function(x) {
as.numeric(grepl(x, df$Language))
})), cols))
#> Participant English French Spanish
#> 1 1 1 1 0
#> 2 2 1 0 0
#> 3 3 0 1 1
#> 4 4 1 0 1
#> 5 5 0 1 0
Created on 2022-03-13 by the reprex package (v2.0.1)
Data
df <- structure(list(Participant = 1:5, Language = c("English|French",
"English", "Spanish|French", "English|Spanish", "French")),
class = "data.frame", row.names = c(NA, -5L))
df
#> Participant Language
#> 1 1 English|French
#> 2 2 English
#> 3 3 Spanish|French
#> 4 4 English|Spanish
#> 5 5 French