I have a data set with hundreds of participant & control responses to 26 questions. Each participant has 26 questions associated with them where they answered Yes (1), No (-1), Maybe (0), or did not answer (NA)
For each participant, I want to sum all of their specific responses for all 26 questions and save it to a new column. So if they answered Yes (1) 12 out of 26 times, then the new column should have the number 12 in it -- ignoring the No (-1) values.
I have tried for loops, if else statements, sub setting, group by and sum, etc. I just can’t figure out how to loop through each of the 26 questions and sum only theirs — ignoring the other participants.
Edit: Here is a representative example of what the code would look like.
ID PatientResponse ControlResponse QuestionNumber
1 122047 1 0 1
2 123274 -1 -1 1
3 186223 1 1 1
4 122047 0 -1 2
5 123274 1 -1 2
6 186223 -1 0 2
Here is an image of what one question looks like for various participants: https://i.stack.imgur.com/ojGGO.png
Here is what I would like it to ideally look like after all 26 questions have been summed for each participant : https://i.stack.imgur.com/W6Qo3.png
CodePudding user response:
library(dplyr); library(tidyr)
# this will give the count of each kind of response in its own column
df %>%
count(Question, Participant, Control) %>%
pivot_wider(names_from = Control, values_from = n)
#if you just want Yes's counted
df %>%
group_by(Question, Participant) %>%
summarize(Summed_Yes_Responses = sum(Control == 1, na.rm = TRUE))