Home > OS >  Plot labels of the mean of the dependent variable on a stacked bar plot by two categorical variables
Plot labels of the mean of the dependent variable on a stacked bar plot by two categorical variables

Time:03-08

I am using a gender-pay gap data from Glassdoor which is accessible from enter image description here

What I want to show is to label the mean total salary for male and female employees and for each job title separately on the pink and blue colored bars.

Any help would be appreciated

CodePudding user response:

In terms of comparing salaries by gender, a side-by-side comparison seems more practical (as already pointed out in the comments).

Nevertheless - regarding the technical question of positioning the labels, here is one way of doing it. The tricky part is finding the center positions of the stacked bars.

library(tidyverse)

df <- readr::read_csv("~/data.csv")

df_summary <- df %>% 
  group_by(gender, jobTitle, perfEval) %>%
  summarize(totalcomp = mean(basePay   bonus),
            totalcomp_label = paste0(round(totalcomp * 1e-3, 0), "k")) %>%
  ungroup() 

df_plot <- df_summary %>% 
  left_join(
    # the messy part to find approriate label positions - there may be a solution with less pivoting steps
    df_summary %>%
      tidyr::pivot_wider(id_cols = c(jobTitle, perfEval), 
                         values_from = "totalcomp", names_from = "gender", values_fill = 0) %>%
      dplyr::mutate(labelpos_M = Male/2, labelpos_F = Male   Female/2) %>% 
      tidyr::pivot_longer(c(Female, Male), names_to = "gender") %>%
      dplyr::mutate(
        labelpos = case_when(gender == "Male" ~ labelpos_M,
                             gender == "Female" ~ labelpos_F,
                             TRUE ~ NA_real_)
      ) %>%
      dplyr::select(jobTitle, perfEval, gender, labelpos),
    by = c("jobTitle", "perfEval", "gender")
  ) 

# A tibble: 98 x 6
#   gender jobTitle       perfEval totalcomp totalcomp_label labelpos
#   <chr>  <chr>             <dbl>     <dbl> <chr>              <dbl>
# 1 Female Data Scientist        1   118479. 118k             164089.
# 2 Female Data Scientist        2   105040. 105k             140556.
# 3 Female Data Scientist        3   100275. 100k             149580.
# 4 Female Data Scientist        4    87633. 88k              127996.
# 5 Female Data Scientist        5   101449. 101k             142046.


df_plot %>%
  ggplot()  
  geom_col(aes(y = jobTitle, x = totalcomp, fill = gender), width = 0.9, color = "black")  
  theme_bw()  
  labs(x = "Job Title", y = "Mean Total Salary", fill = "Gender")  
  theme(axis.title = element_text(size = 10, color = "blue"),
        axis.text = element_text(size = 8),
        legend.position = "top")  
  scale_fill_manual(values = c("#FF66CC", "blue"))  
  scale_x_continuous(labels = scales::comma)  
  facet_wrap( ~ perfEval)  
  # positioning the labels
  geom_text(aes(x = labelpos, y = jobTitle, label = totalcomp_label), 
            color = "white")

enter image description here

  • Related