Home > Software design >  Analyzing and recoding multiple answer question in my dataframe
Analyzing and recoding multiple answer question in my dataframe

Time:12-13

I am trying to figure out the best way to transform and analyze this dataframe that has a column for each answer. I am trying to create a new column that displays the combination when multiple answers selected.

My original dataframe looks like this:

df <- data.frame('ID' = c("001", "002", "003", "004", "005", "006"),
    'Answer_1' = c("1", "1", "0", "0", "1", "0"),
    'Answer_2' = c("1", "0", "1", "0", "0", "1"),
    'Answer_3' = c("1", "1", "0", "1", "0", "1"))

I am trying to consolidate those multiple answers into a singular column so I can run descriptives of the different answer combinations.

I'm looking to come up with a dataframe that looks like this:

df1 <- data.frame('ID' = c("001", "002", "003", "004", "005", "006"),
'Answer_4' = c("Answer_1 | Answer_2 | Answer_2", "Answer_1 | Answer_3",                                                             
"Answer_2", "Answer_3", "Answer_1", "Answer_2 | Answer_3"

CodePudding user response:

If I understood this will help you

Data

df <- data.frame('ID' = c("001", "002", "003", "004", "005", "006"),
                 'Answer_1' = c("1", "1", "0", "0", "1", "0"),
                 'Answer_2' = c("1", "0", "1", "0", "0", "1"),
                 'Answer_3' = c("1", "1", "0", "1", "0", "1"))

Code

  collapse_str <- function(x){
  paste0(na.omit(x),collapse = " | ")
}                   
                  
df %>% 
  mutate(
   across(
     .cols = -ID,
     .fns = function(x) ifelse(x == 1,cur_column(),NA_character_)
     )
  ) %>% 
  rowwise() %>% 
  mutate(Answer_4 = collapse_str(c_across(cols = -ID)))

Output

# A tibble: 6 x 5
# Rowwise: 
  ID    Answer_1 Answer_2 Answer_3 Answer_4                      
  <chr> <chr>    <chr>    <chr>    <chr>                         
1 001   Answer_1 Answer_2 Answer_3 Answer_1 | Answer_2 | Answer_3
2 002   Answer_1 NA       Answer_3 Answer_1 | Answer_3           
3 003   NA       Answer_2 NA       Answer_2                      
4 004   NA       NA       Answer_3 Answer_3                      
5 005   Answer_1 NA       NA       Answer_1                      
6 006   NA       Answer_2 Answer_3 Answer_2 | Answer_3  

CodePudding user response:

You could convert the 0/1 as.logical and subset the names. The 0/1 should be numeric though, I thus added type.convert.

df[1] |>
  cbind(Answer_4=sapply(asplit(type.convert(df[-1], as.is=TRUE), 1), \(x) {
    names(df[-1])[as.logical(x)] |>
      paste(collapse=' | ')
  }))
#    ID                       Answer_4
# 1 001 Answer_1 | Answer_2 | Answer_3
# 2 002            Answer_1 | Answer_3
# 3 003                       Answer_2
# 4 004                       Answer_3
# 5 005                       Answer_1
# 6 006            Answer_2 | Answer_3

Data:

df <- structure(list(ID = c("001", "002", "003", "004", "005", "006"
), Answer_1 = c("1", "1", "0", "0", "1", "0"), Answer_2 = c("1", 
"0", "1", "0", "0", "1"), Answer_3 = c("1", "1", "0", "1", "0", 
"1")), class = "data.frame", row.names = c(NA, -6L))
  •  Tags:  
  • r
  • Related