Home > Software engineering >  Calculating percentages within category using geom_col
Calculating percentages within category using geom_col

Time:04-16

This has been asked in various forms before, but I'm attempting it in a slightly different way and can't seem to get it exactly right. When I use this code:

d %>% 
  drop_na(attend) %>% 
  count(race, attend) %>% 
  group_by(race) %>%
  mutate(percent = n/sum(n)*100) %>% 
  ggplot(aes(race, percent, fill = race))  
  geom_col(position = "dodge")

I get this figure:

enter image description here

The 'attend' variable is just 0s and 1s, and I want to display the percent of 1s within each race. I think that those lines that are showing up inside the charts are actually correct, but what's going on with the rest of those columns? I can't quite figure out that last step.

CodePudding user response:

To achieve your desired result filter your data for attend == 1 values after computing the percentages.

Note: The blacks lines appear because of overplotting, i.e. as you set position = "dodge" the bars for attend=0 and attend=1 are plotted on top of each other.

Using some random example data:

library(tidyr)
library(dplyr)
library(ggplot2)

set.seed(123)

d <- data.frame(
  race = sample(c("Asian", "White", "Hispanic", "Black", "Other"), 100, replace = TRUE),
  attend = sample(0:1, 100, replace = TRUE)
)

d %>% 
  drop_na(attend) %>% 
  count(race, attend) %>% 
  group_by(race) %>%
  mutate(percent = n/sum(n)*100) %>% 
  filter(attend == 1) %>%
  ggplot(aes(reorder(race, percent), percent, fill = race))  
  geom_col()

enter image description here

  • Related