I want to restructure some "multiple response" survey data from binary to nominal categories.
The survey asks the responder which ten people they most often interact with and gives a list of 50 names. The data comes back with 50 columns, one column for each name, and a name value in each cell for each name selected and blank for unselected names. I want to convert the fifty columns into ten columns (name1 to name10).
Below is an example of what I mean with (for simplicity) 5 names, where the person must select two names with five responders.
id <- 1:5
mike <- c("","mike","","","mike")
tim <- c("tim","","tim","","")
mary <- c("mary","mary","mary","","")
jane <- c("","","","jane","jane")
liz <- c("","","","liz","")
surveyData <- data.frame(id,mike,tim,mary,jane,liz)
Name1 <- c("tim","mike","tim","jane","mike")
Name2 <- c("mary","mary","mary","liz","jane")
restructuredSurveyData <- data.frame(id,Name1,Name2)
Thanks for your help!
CodePudding user response:
replace
''
with NA
and apply
na.omit
.
cbind(surveyData[1], `colnames<-`(t(apply(replace(surveyData[-1],
surveyData[-1] == '', NA), 1,
na.omit)), paste0('name_', 1:2)))
# id name_1 name_2
# 1 1 tim mary
# 2 2 mike mary
# 3 3 tim mary
# 4 4 jane liz
# 5 5 mike jane
A spoiled eye may like this better these days:
replace(surveyData[-1], surveyData[-1] == '', NA) |>
apply(1, na.omit) |>
t() |>
`colnames<-`(paste0('name_', 1:2)) |>
cbind(surveyData[1]) |>
subset(select=c('id', 'name_1', 'name_2'))
# id name_1 name_2
# 1 1 tim mary
# 2 2 mike mary
# 3 3 tim mary
# 4 4 jane liz
# 5 5 mike jane
Note: R >= 4.1 used.
CodePudding user response:
Another possible solution, based on tidyverse
:
library(tidyverse)
surveyData %>%
pivot_longer(-id) %>%
filter(value != "") %>%
mutate(nam = if_else(row_number() %% 2 == 1, "names1", "names2")) %>%
pivot_wider(id, names_from = nam)
#> # A tibble: 5 × 3
#> id names1 names2
#> <int> <chr> <chr>
#> 1 1 tim mary
#> 2 2 mike mary
#> 3 3 tim mary
#> 4 4 jane liz
#> 5 5 mike jane
Or using purrr::pmap_df
:
library(tidyverse)
pmap_df(surveyData[-1], ~ str_c(c(...)[c(...) != ""], collapse = ",") %>%
set_names("names")) %>%
separate(names, into = str_c("names", 1:2), sep = ",") %>%
bind_cols(select(surveyData, id), .)
#> id names1 names2
#> 1 1 tim mary
#> 2 2 mike mary
#> 3 3 tim mary
#> 4 4 jane liz
#> 5 5 mike jane