Reshape dataframe from long to wide but amount of columns different for every row-CodePudding

I have the following dataframe format in R

question_id	answer_id	question_text	answer_text
10000	0001	how many people are alive?	20000
10000	0002	how many people are alive?	50000
10000	0003	how many people are alive?	60000
10000	0004	how many people are alive?	900000
20000	0021	what is the meaning of life?	yes
20000	0072	what is the meaning of life?	no
20000	0083	what is the meaning of life?	maybe
20000	0094	what is the meaning of life?	ok
20000	0097	what is the meaning of life?	indifference

I want it in the following format:

question_id	question_text	answer_text_1	answer_text_2	answer_text_3	answer_text_4	...	answer_text_n
10000	how many people are alive?	20000	50000	60000	900000	...	NA
20000	what is the meaning of life?	yes	no	maybe	ok	...	indifference

So as you can see I want the question_id, then the question_text itself and then a set of column which equal to the maximum amount of answers there can be to a question. So some questions are true or false so would only have 2 columns filled in. But also multiple choice questions can exist, which might have 7 different options to choose from. I want it to be adaptive.

The only thing I could think of was tidyr::pivot_wider(). I can't seem to make this work. Any help would be greatly appreciated. Thanks!!

========== edit ===========

the code that I tried

qa_columns <- function(qa_df, question_list){
  df <- qa_df %>%
    dplyr::filter(question_id %in% question_list) %>%
    tidyr::pivot_wider(values_from = answer_text)
  return(df)
}

qa <- qa_columns(qa_text, question_list = question_list)

CodePudding user response：

Try adding a row-number per group, then keeping question_text as an id:

library(dplyr)
library(tidyr) # pivot_wider
dat %>%
  group_by(question_id) %>%
  mutate(rn = row_number()) %>%
  ungroup() %>%
  pivot_wider(c("question_id", "question_text"), names_from = "rn", names_prefix = "answer_text_", values_from = "answer_text")
# # A tibble: 2 x 7
#   question_id question_text                answer_text_1 answer_text_2 answer_text_3 answer_text_4 answer_text_5
#         <int> <chr>                        <chr>         <chr>         <chr>         <chr>         <chr>        
# 1       10000 how many people are alive?   20000         50000         60000         900000        <NA>         
# 2       20000 what is the meaning of life? yes           no            maybe         ok            indifference

Note that this coerces numbers to character in your answer_text_# columns, generally unavoidable.

CodePudding user response：

Using dcast from data.table

library(data.table)
dcast(setDT(dat), question_id   question_text ~ 
   paste0("answer_text_", rowid(question_id)), value.var = "answer_text")

-output

 question_id                question_text answer_text_1 answer_text_2 answer_text_3 answer_text_4 answer_text_5
1:       10000   how many people are alive?         20000         50000         60000        900000          <NA>
2:       20000 what is the meaning of life?           yes            no         maybe            ok  indifference

data

dat <- structure(list(question_id = c(10000L, 10000L, 10000L, 10000L, 
20000L, 20000L, 20000L, 20000L, 20000L), answer_id = c(1L, 2L, 
3L, 4L, 21L, 72L, 83L, 94L, 97L), question_text = c("how many people are alive?", 
"how many people are alive?", "how many people are alive?", "how many people are alive?", 
"what is the meaning of life?", "what is the meaning of life?", 
"what is the meaning of life?", "what is the meaning of life?", 
"what is the meaning of life?"), answer_text = c("20000", "50000", 
"60000", "900000", "yes", "no", "maybe", "ok", "indifference"
)), class = "data.frame", row.names = c(NA, -9L))