Home > front end >  R-counting multiple columns from a multiple choice output
R-counting multiple columns from a multiple choice output

Time:05-21

I read in demographic data from a survey based data collection system. The data from multiple choice questions (select all that apply) comes in with each choice being a separate column. I merged them, which looked great until I tried to output graphs, and then I realized R counts each unique combination of variables rather than all of each variable. (ex, favorite color blue = 2, favorite color blue and red = 1; not blue =3). I tried again with the raw data to make a table with the just the id variable and variables I want to count.

data<-(record_id=c(1,2,3,4,5,6,7,8,9,10,11), choice_1 = c('blue','blue','blue','blue','blue',NA,NA,NA,NA,NA,NA),
choice_2 = c('red',NA,NA,NA,NA,'red',NA,NA,NA,NA,NA), choice_3=c(NA,'green','green',NA,NA,NA,'green','green',NA,NA,NA))
choices.df<-data%>%
dplyr::select(record_id,choice_1,choice_2,choice_3)%>%
group_by(choice_1,choice_2,choice_3)%>%
summarise(counted_choices = n())

The count column counts each rowwise combination as unique, Instead I want the output of the count to read 3 blue, 2 red, 4 green. How would I modify this code?

One thought i had was to merge and then string search using and if_else function or loop.

Any help is appreciated.

CodePudding user response:

Not sure if I understood what's your expected output, but as Limey suggested: pivot_longer() from tidyr will reshape your dataset into a format that's much easier to analyse.

data |>
  tidyr::pivot_longer(cols = !record_id,
                      values_to = "colour") |> 
  dplyr::count(colour)

Output:

  colour     n
  <chr>  <int>
1 blue       5
2 green      4
3 red        2
4 NA        22
  • Related