I have a survey datasheet for a group of participants and each participant has a unique ID. But all of the questions and answers were listed as individual rows for each participant, for example:
I would like to clean up the data and add each question as an individual column with one participant correspond to one row with all questions and answers using R like below: enter image description here
I assume there is something using the "group_by()" function, but I am not sure how exactly should I do it. Thank you so much!
CodePudding user response:
I couldn't find a library or a function so I write it down below.
There should exist a function but I can't remember it.
Anyway the code below work fine even if "lack pattern".
d<-data.frame(rep(1:3,each=5), strsplit(strrep("ABCDE",3),""), 1:15)
colnames(d)<-c("Enrollment_ID","Question","Answer")
d
# Enrollment_ID Question Answer
# 1 1 A 1
# 2 1 B 2
# 3 1 C 3
# : : : :
tbl <- function(d){
d<-d[order(d[,1], d[,2]), ] # sort dataframe
res<-c() # vector for result
rh<-sort(unique(d$Enrollment_ID)) # row header
ch<-sort(unique(d$Question)) # col header
for(r in rh){
for(c in ch){
v<-d[d[,1] == r & d[,2] == c, 3]
res<-c(res, ifelse(length(v)==0, "", v))
} # replace unmatched value to empty string
}
return(matrix(res, ncol=length(ch), byrow=T, dimnames=list(rh, ch)))
}
tbl(d)
# A B C D E
# 1 1 2 3 4 5
# 2 6 7 8 9 10
# 3 11 12 13 14 15
# even if we remove the first record of d, tbl func works fine
tbl(d[-1,])
# A B C D E
# 1 "" "2" "3" "4" "5"
# 2 "6" "7" "8" "9" "10"
# 3 "11" "12" "13" "14" "15"
CodePudding user response:
That's precisely the kind of transformation task that pivot_wider
is best for:
library(tidyr)
df %>%
pivot_wider(names_from = Question,
values_from = Answer,
names_glue = "Question_{.name}")
# A tibble: 3 × 5
ID Question_A Question_B Question_C Question_D
<dbl> <chr> <chr> <chr> <chr>
1 1 xxx xxx xxx xxx
2 2 xxx xxx xxx xxx
3 3 xxx xxx xxx xxx
Data:
df <- data.frame(
ID = c(1,1,1,1,2,2,2,2,3,3,3,3),
Question = c("A", "B", "C", "D", "A", "B", "C", "D", "A", "B", "C", "D"),
Answer = c(rep("xxx", 12))
)