First time asking a question so apologies if my formatting is off.
I have searched for 3 days online to find an answer for this but have been unable to find anything. I have a dataframe that is comprised on 3 columns: repeated randomly generated participant IDs (e.g., W21334D0, B8123K), a question number (categorical - e.g., q1, q2, q3), and a question response (both numerical and categorical - e.g., "1", "1,2", "15,20,15").
For example, the dataframe I will always be given is in the form:
Participant | question_id | question_answer |
---|---|---|
W21334D0 | q1 | 1 |
W21334D0 | q2 | 1,2 |
W21334D0 | q3 | 0 |
W21334D0 | q4 | 1 |
B8123K | q3 | 1 |
B8123K | q2 | 2,1 |
B8123K | q4 | 0 |
P0213MEW | q1 | 1 |
P0213MEW | q3 | 0 |
P0213MEW | q4 | 1 |
P0213MEW | q2 | 1,2 |
I'm hoping to rearrange the dataframe or create a new one such that each unique participant id is a single row where each unique question_id is a column in order of q1-q96.
For example:
Participant | q1 | q2 | q3 | q4 |
---|---|---|---|---|
W21334D0 | 1 | 1,2 | 0 | 1 |
B8123K | NA | 2,1 | 1 | 0 |
P0213MEW | 1 | 1,2 | 0 | 1 |
I've tried a variety of things over the past couple of days, and the closest I've gotten is creating separate dataframes for each unique participant that includes rows of the participants question_ids and question_answers (but not in order).
To do this, I did:
for(i in unique(dat$participant)) {
nam <- paste(i)
assign(nam, t(dat[dat$participant==i,-1]))
}
But with this code, I don't know how to combine the dataframes into a single dataframe, nor do so in such a way that the rows are all in order of "q1, q2, q3, q4..."
Any help would be greatly appreciated!
CodePudding user response:
reshape2
reshape2::dcast(dat, Participant ~ question_id, value.var = "question_answer")
# Participant q1 q2 q3 q4
# 1 B8123K <NA> 2,1 1 0
# 2 P0213MEW 1 1,2 0 1
# 3 W21334D0 1 1,2 0 1
(This also works with the data.table
package, if dat
inherits data.table
.)
dplyr
tidyr::pivot_wider(dat, Participant, names_from = "question_id", values_from = "question_answer")
# # A tibble: 3 x 5
# Participant q1 q2 q3 q4
# <chr> <chr> <chr> <chr> <chr>
# 1 W21334D0 1 1,2 0 1
# 2 B8123K <NA> 2,1 1 0
# 3 P0213MEW 1 1,2 0 1
Data
dat <- structure(list(Participant = c("W21334D0", "W21334D0", "W21334D0", "W21334D0", "B8123K", "B8123K", "B8123K", "P0213MEW", "P0213MEW", "P0213MEW", "P0213MEW"), question_id = c("q1", "q2", "q3", "q4", "q3", "q2", "q4", "q1", "q3", "q4", "q2"), question_answer = c("1", "1,2", "0", "1", "1", "2,1", "0", "1", "0", "1", "1,2")), class = "data.frame", row.names = c(NA, -11L))
CodePudding user response:
You can use pivot_wider() in the package tidyr
to achieve that goal.
Below, with a sample data:
dt <- data.frame("Participant" = sample(c("W21334D0", "B8123K", "P0213MEW"), 12, replace = T),
"question_id" = sample(paste0("q", 1:4), 12, replace = T),
"question_answer" = sample(rnorm(100, 1, 1), 12, replace = T))
library(tidyr)
dt %>%
pivot_wider(id_cols = "Participant",
names_from = "question_id",
values_from = "question_answer",
values_fn = mean)