Restructure binary "multiple response" data to categorical-CodePudding

I want to restructure some "multiple response" survey data from binary to nominal categories.

The survey asks the responder which ten people they most often interact with and gives a list of 50 names. The data comes back with 50 columns, one column for each name, and a name value in each cell for each name selected and blank for unselected names. I want to convert the fifty columns into ten columns (name1 to name10).

Below is an example of what I mean with (for simplicity) 5 names, where the person must select two names with five responders.

id <- 1:5
mike <- c("","mike","","","mike")
tim <- c("tim","","tim","","")
mary <- c("mary","mary","mary","","")
jane <- c("","","","jane","jane")
liz <- c("","","","liz","")

surveyData <- data.frame(id,mike,tim,mary,jane,liz)

Name1 <- c("tim","mike","tim","jane","mike")
Name2 <- c("mary","mary","mary","liz","jane")

restructuredSurveyData <- data.frame(id,Name1,Name2)

Thanks for your help!

CodePudding user response：

replace '' with NA and apply na.omit.

cbind(surveyData[1], `colnames<-`(t(apply(replace(surveyData[-1], 
                                                  surveyData[-1] == '', NA), 1, 
                                          na.omit)), paste0('name_', 1:2)))
#   id name_1 name_2
# 1  1    tim   mary
# 2  2   mike   mary
# 3  3    tim   mary
# 4  4   jane    liz
# 5  5   mike   jane

A spoiled eye may like this better these days:

replace(surveyData[-1], surveyData[-1] == '', NA) |>
  apply(1, na.omit) |>
  t() |>
  `colnames<-`(paste0('name_', 1:2)) |>
  cbind(surveyData[1]) |>
  subset(select=c('id', 'name_1', 'name_2'))
#   id name_1 name_2
# 1  1    tim   mary
# 2  2   mike   mary
# 3  3    tim   mary
# 4  4   jane    liz
# 5  5   mike   jane

Note: R >= 4.1 used.

CodePudding user response：

Another possible solution, based on tidyverse:

library(tidyverse)

surveyData %>% 
  pivot_longer(-id) %>%
  filter(value != "") %>%
  mutate(nam = if_else(row_number() %% 2 == 1, "names1", "names2")) %>% 
  pivot_wider(id, names_from = nam)

#> # A tibble: 5 × 3
#>      id names1 names2
#>   <int> <chr>  <chr> 
#> 1     1 tim    mary  
#> 2     2 mike   mary  
#> 3     3 tim    mary  
#> 4     4 jane   liz   
#> 5     5 mike   jane

Or using purrr::pmap_df:

library(tidyverse)

pmap_df(surveyData[-1], ~ str_c(c(...)[c(...) != ""], collapse = ",") %>% 
        set_names("names")) %>% 
  separate(names, into = str_c("names", 1:2), sep = ",") %>%
   bind_cols(select(surveyData, id), .)

#>   id names1 names2
#> 1  1    tim   mary
#> 2  2   mike   mary
#> 3  3    tim   mary
#> 4  4   jane    liz
#> 5  5   mike   jane