Home > Software engineering >  plotting the proportion of occurrence of a categorical variable in a sample
plotting the proportion of occurrence of a categorical variable in a sample

Time:04-03

I have a variable in a dataset called "gender" that can take values "m" or "f". I want to see the proportion of "m" in the sample. I have tried something similar to the following, but this code actually works to account for two variables and not for one. Any ideas?

ggplot(df,aes(x = gender,fill = gender))   
    geom_bar(position = "fill")

Thank you

CodePudding user response:

If you want to show the proportions of each of your categories than I would suggest to compute the proportions manually instead of relying on position="fill".

One approach would be to compute the props on the fly using after_stat and the counts computed by geom_bar under the hood like so:

library(ggplot2)

ggplot(mtcars, aes(x = factor(cyl), fill = factor(cyl)))   
  geom_bar(aes(y = after_stat(count / sum(count))))  
  scale_y_continuous(labels = scales::percent)

A second approach would be to aggregate your data before passing it to ggplot like so:

library(dplyr)

mtcars |> 
  count(cyl) |> 
  mutate(pct = n / sum(n)) |> 
  ggplot(aes(x = factor(cyl), fill = factor(cyl)))   
  geom_col(aes(y = pct))  
  scale_y_continuous(labels = scales::percent)

  • Related