Home > Software engineering >  How to plot the share of a variable in a histogram with ggplot2
How to plot the share of a variable in a histogram with ggplot2

Time:09-20

I've tried looking at old threads unsuccessfully. I'm trying to plot the share of male legislators in different parliamentary sessions in a histogram.

This is my code, which works but shows the amount of legislators (NOT the share). How can I plot the share? Thanks!

    mergedf %>%
ggplot( aes(x = session, fill = factor(sex)))  
    geom_histogram (binwidth = 0.5) 
theme_minimal() 
  theme(legend.position ="bottom") 
  labs(title = "Share of male legislators by session", x= "Session", y = "Share of legislators", 
       fill ="sex")

Edit: I get the share of legislators with this table, but I don't know how to integrate it in the histogram.

mergedf %>% 
  tabyl (session, sex) %>% 
  adorn_percentages() %>% 
  adorn_pct_formatting ()

CodePudding user response:

One option would be to use some dplyr verbs to compute the counts and percentages which could then be displayed as a barchart (a histogram is something different) via geom_col like so:

mergedf <- data.frame(
  sessions = c( 1, 2, 3, 4, 5, 2, 3, 4, 2),
  sex = c ("female", "female", "female", "male", "female", "female", "female", "male", "male")
)

library(dplyr)
library(ggplot2)

mergedf %>%
  group_by(sessions, sex) %>% 
  summarise(n = n()) %>%
  mutate(pct = n / sum(n)) %>%
  ggplot( aes(x = factor(sessions), y = pct, fill = sex))  
  geom_col(width = .6) 
  theme_minimal() 
  theme(legend.position ="bottom") 
  labs(title = "Share of male legislators by session", x= "Session", y = "Share of legislators", 
       fill ="sex")
#> `summarise()` has grouped output by 'sessions'. You can override using the
#> `.groups` argument.

CodePudding user response:

You simply need to specify position="fill" in your geom_histogram parameters:

library(ggplot2)
mergedf <- data.frame(
  session = c( 1, 2, 3, 4, 5, 2, 3, 4, 2),
  sex = c ("female", "female", "female", "male", "female", "female", "female", "male", "male")
)

ggplot(mergedf, aes(x = session, fill = factor(sex)))  
  geom_histogram (binwidth = 0.5, position = "fill")    # <- HERE
  theme_minimal()  
  theme(legend.position ="bottom")  
  labs(title = "Share of male legislators by session", 
       x= "Session", y = "Share of legislators", fill ="sex")

Technically, you're not really building a histogram (binned distribution of counts) but a barplot, so you could alternatively use the geom_bar geom, with the same format:

ggplot(mergedf, aes(x = session, fill = factor(sex)))  
  geom_bar(position="fill")  
  theme_minimal ()  
  theme(legend.position ="bottom")  
  labs(title = "Share of male legislators by session", 
       x= "Session", y = "Share of legislators", fill ="sex")
  • Related