Home > Software design >  How would I make a grouped barchart for a multiple-answer survey question?
How would I make a grouped barchart for a multiple-answer survey question?

Time:06-10

I have a data set involving some survey data that I have been doing some data analysis and data visualization with. The survey data set in question has a couple of "check all that apply" response questions, with the following as an example > Q3_YN_Num (where 0==didn't select any answer, 1==did select at least one answer), as well as individual Q3X variables for each potential answer (where 0==didn't select that specific answer, and 1==did select that specific answer).

That section of the dataset looks kind of like this:

Q3A Q3B Q3C Q3D Q3E Q3_YN_Num
1 0 0 0 0 1
0 0 0 0 0 0
1 1 0 0 0 1
1 0 0 1 0 1
0 0 1 0 0 1
... ... ... ... ... ...
n n n n n n

I have been able to make bar graphs of each individual Q3X variable with my own R code to show whether or not each individual response was selected, like this one:

Q3AN <- dataframe$Q3A
library(ggplot2)
ggplot(dataframe, aes(x=Q3AN))  
  geom_bar(stat="count", fill="#782F40", colour="black")  
  theme_bw()  
  theme(panel.grid.major = element_blank(),panel.grid.minor = element_blank())

Basic Bar Graph

However, I am stuck as to how to combine each of the Q3X responses as their own groups that have multiple bars for the 0 and 1 categories. I'm trying to make a bar graph that would look somewhat like this sketch for a presentation:

Quick Grouped Bar Graph Sketch

My first attempt was with this code;

ggplot(dataframe, aes(x=Q3AN Q3BN Q3CN Q3DN Q3EN))  
  geom_bar(stat="count", colour="black")  
  theme_bw()  
  theme(panel.grid.major = element_blank(),panel.grid.minor = element_blank())

However, all I've gotten is this

Very odd bargraph

So far, I've tried to splice each Q3X response into "Unheard" and "Heard" groups with the following code

dataframe$Grouping1[dataframe$Q3A==0] <-"Unheard"
dataframe$Grouping1[dataframe$Q3A==1] <-"Heard"
dataframe$Grouping2[dataframe$Q3B==0] <-"Unheard"
dataframe$Grouping2[dataframe$Q3B==1] <-"Heard"
dataframe$Grouping3[dataframe$Q3C==0] <-"Unheard"
dataframe$Grouping3[dataframe$Q3C==1] <-"Heard"
dataframe$Grouping4[dataframe$Q3D==0] <-"Unheard"
dataframe$Grouping4[dataframe$Q3D==1] <-"Heard"
dataframe$Grouping5[dataframe$Q3E==0] <-"Unheard"
dataframe$Grouping5[dataframe$Q3E==1] <-"Heard"
Groups <-c(dataframe$Grouping1, dataframe$Grouping2, dataframe$Grouping3, dataframe$Grouping4, dataframe$Grouping5)
Subgroup = c("Unheard","Heard")

This is where I seem to get stuck, because I cannot seem to produce any graphs that are what I want (or sometimes even a graph at all) based on some of the code I have tried from here.

For example, here's some of the code I have tried;

#Attempt 1
ggplot(dataframe, aes(x=Groups, fill=Subgroup)   
  geom_bar(stat="count", colour="black")   
  theme_bw()   
  theme(panel.grid.major = element_blank(),panel.grid.minor = element_blank())

#Attempt 2
ggplot(dataframe, aes(x=Groupings1   Groupings2   Groupings3   Groupings4   Groupings5, fill=Subgroup)   
  geom_bar(stat="count", colour="black")   
  theme_bw()   
  theme(panel.grid.major = element_blank(),panel.grid.minor = element_blank())   )

However, when I've tried the above code or any modified versions thereof, I still cannot get it to work. I am absolutely at a loss and would like to know if there's an efficient way to do this in R.

CodePudding user response:

You could achieve your desired result by reshaping you data to long format using e.g. tidyr::pivot_longer so that you get a column with the questions or items as categories and another column with the responses.

Doing so makes it easy to plot your survey results via ggplot2 as you could map the new item column on x and color your bars by the type of response:

library(tidyr)
library(ggplot2)

# Convert to long format
df_long <- df |>
  pivot_longer(-Q3_YN_Num, names_to = "item", values_to = "response")

ggplot(df_long, aes(item, fill = factor(response)))  
  geom_bar(position = position_dodge(preserve = "single"))  
  scale_fill_manual(labels = c("0" = "Didn't select", "1" = "Selected"), values = c("0" = "red", "1" = "blue"))

DATA

df <- structure(list(
  Q3A = c(1, 0, 1, 1, 0), Q3B = c(0, 0, 1, 0, 0),
  Q3C = c(0, 0, 0, 0, 1), Q3D = c(0, 0, 0, 1, 0), Q3E = c(
    0,
    0, 0, 0, 0
  ), Q3_YN_Num = c(1, 0, 1, 1, 1)
), row.names = c(
  NA,
  5L
), class = "data.frame")
  • Related