I read in demographic data from a survey based data collection system. The data from multiple choice questions (select all that apply) comes in with each choice being a separate column. I merged them, which looked great until I tried to output graphs, and then I realized R counts each unique combination of variables rather than all of each variable. (ex, favorite color blue = 2, favorite color blue and red = 1; not blue =3). I tried again with the raw data to make a table with the just the id variable and variables I want to count.
data<-(record_id=c(1,2,3,4,5,6,7,8,9,10,11), choice_1 = c('blue','blue','blue','blue','blue',NA,NA,NA,NA,NA,NA),
choice_2 = c('red',NA,NA,NA,NA,'red',NA,NA,NA,NA,NA), choice_3=c(NA,'green','green',NA,NA,NA,'green','green',NA,NA,NA))
choices.df<-data%>%
dplyr::select(record_id,choice_1,choice_2,choice_3)%>%
group_by(choice_1,choice_2,choice_3)%>%
summarise(counted_choices = n())
The count column counts each rowwise combination as unique, Instead I want the output of the count to read 3 blue, 2 red, 4 green. How would I modify this code?
One thought i had was to merge and then string search using and if_else function or loop.
Any help is appreciated.
CodePudding user response:
Not sure if I understood what's your expected output, but as Limey suggested: pivot_longer()
from tidyr
will reshape your dataset into a format that's much easier to analyse.
data |>
tidyr::pivot_longer(cols = !record_id,
values_to = "colour") |>
dplyr::count(colour)
Output:
colour n
<chr> <int>
1 blue 5
2 green 4
3 red 2
4 NA 22