I've tried everywhere to find the answer to this question but I am still stuck, so here it is:
I have a data frame data_1
that contains data from an ongoing latent profile analysis. The variables of interest for this question are profiles
and gender
.
I would like to plot gender distribution by profile, but within each profile show what % of each gender we have compared to the entire sample of this gender. For example, if we have 10 women and 5 in Profile 1, I want the text on top of the women bar for Profile 1 to show 50%.
Right now I am using the following code but it is giving me the percentage for the entire population, while I just want the percentage compared to the total number of women.
ggplot(data = subset(data_1, !is.na(gender)),
aes(x = gender, fill = gender)) geom_bar()
facet_grid(cols=vars(profiles)) theme_minimal()
scale_fill_brewer(palette = 'Accent', name = "Gender",
labels = c("Non-binary", "Man", "Woman"))
labs(x = "Gender", title = "Gender distribution per LPA profile")
geom_text(aes(y = ((..count..)/sum(..count..)),
label = scales::percent((..count..)/sum(..count..))),
stat = "count", vjust = -28)
Thanks in advance for your help!
I tried multiple alternatives including creating the variable within the dataset using summarize
and mutate
but with no success unfortunately.
CodePudding user response:
As untidy as it seems, it's likely the best approach to summarise outside of the ggplot2
call, which can be done like this:
library(tidyverse)
data1 <- tibble(gender = sample(c("male", "female"), 100, replace = TRUE),
profile = sample(c("profile1", "profile2"), 100, replace = TRUE))
data1 |>
count(gender, profile) |>
group_by(gender) |>
mutate(perc = n / sum(n)) |>
ggplot(aes(x = gender, y = n, fill = gender))
geom_col()
facet_grid(~profile)
geom_text(aes(y = n 3, label = scales::percent(perc)))
The facet_grid
is essentially grouping the dataset by profile
before doing any calculations of values, so in essence it's blind to the data in the other facet. I think only approach is thus summarising before the call and using geom_col
(defaulting to stat = "identity"
) to make the plots. Note that the y
value for the labels is calculated from the count variable - R will position the text relative to the counted values of the bars.
Edit - actually no, there's a "simpler" way
I tell a lie, you can actually do it in the ggplot2
call, but it's a little messier:
data1 |>
ggplot(aes(x = gender, fill = gender))
geom_bar()
facet_grid(~ profile)
stat_count(aes(y = after_stat(count) 2,
label = scales::percent(after_stat(count) /
tapply(after_stat(count),
after_stat(group),
sum)[after_stat(group)]
)),
geom = "text")
Code borrowed from here. The after_stat(group)
part is accessing the grouped gender
count across both facets. Today I learned something!