Stacked Bar Chart in ggplot2 not matching raw data (Likert-score Data pre and post-test)-CodePudding

I have seen a few posts asking about this concern but none of them quite pertained to my issue. I am making a stacked bar chart with survey data on a scale of strongly disagree to strongly agree (I want strongly disagree at the top and strongly agree on bottom), with a question that was asked of police from 5 departments and the proportion of answers are shown below and after. This is what I have right now. My original dataset has more rows, but this is a reproducible subset that I can show publicly. Each letter represents a police department.

ggplot(response, aes(x=forcats::fct_infreq(condition), y=att_enable, group=att_enable, fill=factor(att_enable_fmt, levels=c("Strongly Disagree", "Disagree", "Neutral", "Agree", "Strongly Agree")))) 
  geom_bar(position="fill", stat="identity") 
  guides(fill=guide_legend(title="")) #modify title of legend
  scale_y_continuous(labels=scales::percent, expand = c(0, 0)) 
  theme_light() 
  xlab(" ") ylab("") 
  theme(legend.text=element_text(face="bold"), axis.text=element_text(face="bold", colour="black"), axis.title=element_text(face="bold", colour="black")) 
  facet_wrap(~police_dept_fmt, nrow=1) 
  theme(plot.title=element_text(hjust=0.5, face="bold", size=15))

This visual looks great, BUT the issue is the proportions of strongly disagree/ disagree/ etc. do NOT match my raw data. For example, for police department C 50% of the department (3 out of 6 respondents) and 50% were neutral during the 'Before' period, but on this stacked bar chart shows that about 25% (so the third column from the right), strongly disagreed. I made sure my raw data and formatting match and for full disclosure, the n's from each group are different but we are making them the same height for this visual and listing the n's in the manuscript.

Thank you for your help!

Here is some reproducible data:

response <- structure(list(
  att_enable = structure(c(
    3, 3, 1, 1, 3, 4, 2, 1, 3, 4, 1, 1, 2, 2, 2, 2, 2, 2, 3, 1, 1, 1, 2, 1, 3, 2, 3, 3, 2, 3
  ), label = "att_enable_pre"),
  att_waste = structure(c(3, 3, 1, 1, 3, 3, 2, 1, 3, 4, 1, 1, 2, 1, 3, 3, 2, 2, 3, 2, 1, 1, 2, 1, 3, 2, 1, 1, 2, 3), label = "att_waste_pre"),
  att_fear = structure(c(
    3, 3, 2, 1, 3, 2, 2, 1, 3, 4, 1, 1, 2, 2, 2, 2, 2, 2, 3, 2, 1, 2, 2, 1, 2, 2, 1, 1, 2, 1
  ), label = "att_fear_pre"),
  att_delay = structure(c(3, 3, 1, 1, 3, 2, 2, 1, 3, 4, 1, 1, 2, 2, 2, 2, 2, 2, 3, 2, 1, 1, 2, 1, 3, 2, 3, 1, 2, 1), label = "att_delay_pre"),
  police_dept_fmt = c("B", "B", "C", "C", "B", "B", "B", "B", "C", "C", "D", "D", "B", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C", "B", "C", "C", "D", "D", "B", "B"), condition = c("Before", "After", "Before", "After", "Before", "After", "Before", "After", "Before", "After", "Before", "After", "Before", "After", "Before", "After", "Before", "After", "Before", "Before", "After", "Before", "After", "Before", "Before", "After", "Before", "After", "Before", "After"),
  att_enable_fmt = c("Neutral", "Neutral", "Strongly Disagree", "Strongly Disagree", "Neutral", "Agree", "Disagree", "Strongly Disagree", "Neutral", "Agree", "Strongly Disagree", "Strongly Disagree", "Disagree", "Disagree", "Disagree", "Disagree", "Disagree", "Disagree", "Neutral", "Strongly Disagree", "Strongly Disagree", "Strongly Disagree", "Disagree", "Strongly Disagree", "Neutral", "Disagree", "Neutral", "Neutral", "Disagree", "Neutral"),
  att_waste_fmt = c("Neutral", "Neutral", "Strongly Disagree", "Strongly Disagree", "Neutral", "Neutral", "Disagree", "Strongly Disagree", "Neutral", "Agree", "Strongly Disagree", "Strongly Disagree", "Disagree", "Strongly Disagree", "Neutral", "Neutral", "Disagree", "Disagree", "Neutral", "Disagree", "Strongly Disagree", "Strongly Disagree", "Disagree", "Strongly Disagree", "Neutral", "Disagree", "Strongly Disagree", "Strongly Disagree", "Disagree", "Neutral"),
  att_delay_fmt = c("Neutral", "Neutral", "Strongly Disagree", "Strongly Disagree", "Neutral", "Disagree", "Disagree", "Strongly Disagree", "Neutral", "Agree", "Strongly Disagree", "Strongly Disagree", "Disagree", "Disagree", "Disagree", "Disagree", "Disagree", "Disagree", "Neutral", "Disagree", "Strongly Disagree", "Strongly Disagree", "Disagree", "Strongly Disagree", "Neutral", "Disagree", "Neutral", "Strongly Disagree", "Disagree", "Strongly Disagree"),
  att_fear_fmt = c("Neutral", "Neutral", "Disagree", "Strongly Disagree", "Neutral", "Disagree", "Disagree", "Strongly Disagree", "Neutral", "Agree", "Strongly Disagree", "Strongly Disagree", "Disagree", "Disagree", "Disagree", "Disagree", "Disagree", "Disagree", "Neutral", "Disagree", "Strongly Disagree", "Disagree", "Disagree", "Strongly Disagree", "Disagree", "Disagree", "Strongly Disagree", "Strongly Disagree", "Disagree", "Strongly Disagree")
),
row.names = c(NA, -30L), class = c("tbl_df", "tbl", "data.frame"),
na.action = structure(c(`20` = 20L, `26` = 26L, `73` = 73L, `74` = 74L, `171` = 171L, `196` = 196L, ` 224 ` = 224L, `343` = 343L, `345` = 345L), class = "omit")
)

CodePudding user response：

The issue is simply that you use stat="identity" in geom_bar, i.e. you plot a weighted count of catgories where the weights are the values of your att_enable column, e.g. a "Neutral" response has a weight of 3, while a "Strongly Disagree" response has a weight of only 1. That's why you do not get the right percentages (and counts).

Instead use the default stat="count" and do not map on the y aes:

library(ggplot2)

ggplot(response, aes(x = forcats::fct_infreq(condition), fill = factor(att_enable_fmt, levels = c("Strongly Disagree", "Disagree", "Neutral", "Agree", "Strongly Agree"))))  
  geom_bar(position = "fill")  
  guides(fill = guide_legend(title = ""))   # modify title of legend
  scale_y_continuous(labels = scales::percent, expand = c(0, 0))  
  theme_light()  
  xlab(" ")  
  ylab("")  
  theme(legend.text = element_text(face = "bold"), axis.text = element_text(face = "bold", colour = "black"), axis.title = element_text(face = "bold", colour = "black"))  
  facet_wrap(~police_dept_fmt, nrow = 1)  
  theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 15))