Home > Software engineering >  Displaying percentage of total gender group in each subgroup with ggplot and geomtext
Displaying percentage of total gender group in each subgroup with ggplot and geomtext

Time:01-20

I've tried everywhere to find the answer to this question but I am still stuck, so here it is:

I have a data frame data_1 that contains data from an ongoing latent profile analysis. The variables of interest for this question are profiles and gender.

I would like to plot gender distribution by profile, but within each profile show what % of each gender we have compared to the entire sample of this gender. For example, if we have 10 women and 5 in Profile 1, I want the text on top of the women bar for Profile 1 to show 50%.

Right now I am using the following code but it is giving me the percentage for the entire population, while I just want the percentage compared to the total number of women.

ggplot(data = subset(data_1, !is.na(gender)),
       aes(x = gender, fill = gender))   geom_bar()  
  facet_grid(cols=vars(profiles))   theme_minimal()  
  scale_fill_brewer(palette = 'Accent', name = "Gender", 
                    labels = c("Non-binary", "Man", "Woman"))  
  labs(x = "Gender", title = "Gender distribution per LPA profile")  
  geom_text(aes(y = ((..count..)/sum(..count..)), 
                label = scales::percent((..count..)/sum(..count..))), 
            stat = "count", vjust = -28)

Thanks in advance for your help!

I tried multiple alternatives including creating the variable within the dataset using summarize and mutate but with no success unfortunately.

CodePudding user response:

As untidy as it seems, it's likely the best approach to summarise outside of the ggplot2 call, which can be done like this:

library(tidyverse)

data1 <- tibble(gender = sample(c("male", "female"), 100, replace = TRUE),
                profile = sample(c("profile1", "profile2"), 100, replace = TRUE))

data1 |> 
  count(gender, profile) |>
  group_by(gender) |> 
  mutate(perc = n / sum(n)) |> 
  ggplot(aes(x = gender, y = n, fill = gender))  
  geom_col()  
  facet_grid(~profile)  
  geom_text(aes(y = n   3, label = scales::percent(perc)))

The facet_grid is essentially grouping the dataset by profile before doing any calculations of values, so in essence it's blind to the data in the other facet. I think only approach is thus summarising before the call and using geom_col (defaulting to stat = "identity") to make the plots. Note that the y value for the labels is calculated from the count variable - R will position the text relative to the counted values of the bars.

Edit - actually no, there's a "simpler" way

I tell a lie, you can actually do it in the ggplot2 call, but it's a little messier:

data1 |>
  ggplot(aes(x = gender, fill = gender))  
  geom_bar()  
  facet_grid(~ profile)  
  stat_count(aes(y = after_stat(count)   2,
              label = scales::percent(after_stat(count) / 
                                      tapply(after_stat(count), 
                                             after_stat(group), 
                                             sum)[after_stat(group)]
                 )),
             geom = "text")

Code borrowed from here. The after_stat(group) part is accessing the grouped gender count across both facets. Today I learned something!

  • Related