Rearrange Dataframe in R - Merging Unique Repeated Values into Rows-CodePudding

First time asking a question so apologies if my formatting is off.

I have searched for 3 days online to find an answer for this but have been unable to find anything. I have a dataframe that is comprised on 3 columns: repeated randomly generated participant IDs (e.g., W21334D0, B8123K), a question number (categorical - e.g., q1, q2, q3), and a question response (both numerical and categorical - e.g., "1", "1,2", "15,20,15").

For example, the dataframe I will always be given is in the form:

Participant	question_id	question_answer
W21334D0	q1	1
W21334D0	q2	1,2
W21334D0	q3	0
W21334D0	q4	1
B8123K	q3	1
B8123K	q2	2,1
B8123K	q4	0
P0213MEW	q1	1
P0213MEW	q3	0
P0213MEW	q4	1
P0213MEW	q2	1,2

I'm hoping to rearrange the dataframe or create a new one such that each unique participant id is a single row where each unique question_id is a column in order of q1-q96.

For example:

Participant	q1	q2	q3	q4
W21334D0	1	1,2	0	1
B8123K	NA	2,1	1	0
P0213MEW	1	1,2	0	1

I've tried a variety of things over the past couple of days, and the closest I've gotten is creating separate dataframes for each unique participant that includes rows of the participants question_ids and question_answers (but not in order).

To do this, I did:

for(i in unique(dat$participant)) {
  nam <- paste(i)
  assign(nam, t(dat[dat$participant==i,-1]))
}

But with this code, I don't know how to combine the dataframes into a single dataframe, nor do so in such a way that the rows are all in order of "q1, q2, q3, q4..."

Any help would be greatly appreciated!

CodePudding user response：

reshape2

reshape2::dcast(dat, Participant ~ question_id, value.var = "question_answer")
#   Participant   q1  q2 q3 q4
# 1      B8123K <NA> 2,1  1  0
# 2    P0213MEW    1 1,2  0  1
# 3    W21334D0    1 1,2  0  1

(This also works with the data.table package, if dat inherits data.table.)

dplyr

tidyr::pivot_wider(dat, Participant, names_from = "question_id", values_from = "question_answer")
# # A tibble: 3 x 5
#   Participant q1    q2    q3    q4   
#   <chr>       <chr> <chr> <chr> <chr>
# 1 W21334D0    1     1,2   0     1    
# 2 B8123K      <NA>  2,1   1     0    
# 3 P0213MEW    1     1,2   0     1

Data

dat <- structure(list(Participant = c("W21334D0", "W21334D0", "W21334D0", "W21334D0", "B8123K", "B8123K", "B8123K", "P0213MEW", "P0213MEW", "P0213MEW", "P0213MEW"), question_id = c("q1", "q2", "q3", "q4", "q3", "q2", "q4", "q1", "q3", "q4", "q2"), question_answer = c("1", "1,2", "0", "1", "1", "2,1", "0", "1", "0", "1", "1,2")), class = "data.frame", row.names = c(NA, -11L))

CodePudding user response：

You can use pivot_wider() in the package tidyr to achieve that goal. Below, with a sample data:

dt <- data.frame("Participant" = sample(c("W21334D0", "B8123K", "P0213MEW"), 12, replace = T),
                 "question_id" = sample(paste0("q", 1:4), 12, replace = T),
                 "question_answer" = sample(rnorm(100, 1, 1), 12, replace = T))

library(tidyr)

dt %>% 
  pivot_wider(id_cols = "Participant", 
              names_from = "question_id", 
              values_from = "question_answer", 
              values_fn = mean)