Bar plot of categorical variable (string) with multiple answers-CodePudding

This might be trivial but I can't figure it out and can't find it online. Let's say I survey people asking the reason they did something. Two options: reason 1 and reason 2. They can also pick both options.

data <- data.frame('reason'=c(rep('R1', 5),rep('R2', 3),rep('R1,R2', 4)))
data
   reason
1      R1
2      R1
3      R1
4      R1
5      R1
6      R2
7      R2
8      R2
9   R1,R2
10  R1,R2
11  R1,R2
12  R1,R2

I want to plot the answers, but only counting R1 and R2. That is, if they answered R1 and R2 assign 1 count to each. The command,

ggplot(data = data, aes(x = reason))   geom_bar()    coord_flip()

would plot the multiple answer cases as a separate category.

What I want instead is R1 to have a count of 5 4=9 and R2 to have a count of 3 4=7, and no R1,R2 category.

I am interested in this because I have real data from a Qualtrics survey

CodePudding user response：

Using tidyr::separate_rows you could split your reason column into multiple rows:

data <- data.frame('reason'=c(rep('R1', 5),rep('R2', 3),rep('R1,R2', 4)))

library(tidyr)
library(ggplot2)

data_sep <- data |> 
  separate_rows(reason)

ggplot(data = data_sep, aes(y = reason))   
  geom_bar()

CodePudding user response：

You need to do some data management first. Something like:

mutate(data, 
       R1=grepl('R1', reason),
       R2=grepl('R2', reason)) %>%
  select(-reason) %>%
  pivot_longer(everything(), names_to="reason") %>%
  filter(value) %>%
  count(reason) %>%
  print() %>%
  ggplot(aes(x=reason, y=n))  
  geom_col()  
  coord_flip()