Home > Software engineering >  Can someone help me understand why my ggplot axis get distorted?
Can someone help me understand why my ggplot axis get distorted?

Time:09-08

When i have this code:

C_df <- as.data.frame(C)
  C_df <- C_df %>%
    group_by(core) %>%
    mutate(percent_weight = paste0((round(Freq / sum(Freq),2) * 100),"%"))


ggplot(C_df, aes(fill=condition, y=Freq, x=core))   
    geom_bar(position="fill", stat="identity") 
    theme_classic() 
    scale_y_continuous(expand = c(0, 0),labels = scales::percent, labs(y="Proportion of Cohort (%)")) 
    geom_col(colour = "black", position = "fill") 
    scale_fill_brewer(palette = "Pastel1")

I can produce this graph:

enter image description here

However when i add this one line trying to put the percentages into the bars:

C_df <- as.data.frame(C)
  C_df <- C_df %>%
    group_by(core) %>%
    mutate(percent_weight = paste0((round(Freq / sum(Freq),2) * 100),"%"))


ggplot(C_df, aes(fill=condition, y=Freq, x=core))   
    geom_bar(position="fill", stat="identity") 
    theme_classic() 
    scale_y_continuous(expand = c(0, 0),labels = scales::percent, labs(y="Proportion of Cohort (%)")) 
    geom_col(colour = "black", position = "fill") 
    scale_fill_brewer(palette = "Pastel1") 
    geom_text(aes(label = percent_weight),colour = "white", size = 3, position = position_dodge(.9))

I get this graph:

enter image description here

And i cant for the life of me figure out why

Dummy Data:

Core Condition Freq percent_weight
Core TRUE 8 8%
non-Core TRUE 8 8%
Postcode Core TRUE 6 6%
Postcode non TRUE 8 8%
Core FALSE 92 92%
non-Core FALSE 92 92%
Postcode Core FALSE 94 94%
Postcode non FALSE 92 92%

Data:

CodePudding user response:

The issue is that you use position="fill" for the bars but position="dodge" for the labels. Doing so in case of the bars your Freq variable mapped on y gets rescaled to the interval 0 to 1, while for the labels the unscaled Freq is used. That's the reason why the scale of the y axis in the second plot is different from the one in the first. To fix that use position_fill in geom_text too:

library(ggplot2)

ggplot(C_df, aes(fill = Condition, y = Freq, x = Core))  
  theme_classic()  
  scale_y_continuous(expand = c(0, 0), labels = scales::percent, labs(y = "Proportion of Cohort (%)"))  
  geom_col(colour = "black", position = "fill")  
  scale_fill_brewer(palette = "Pastel1")  
  geom_text(aes(label = percent_weight), colour = "white", size = 3, position = position_fill(vjust = .5))

DATA

C_df <- data.frame(
  Core = c(
    "Core",
    "non-Core", "Postcode Core", "Postcode non", "Core",
    "non-Core", "Postcode Core", "Postcode non"
  ),
  Condition = c(
    TRUE, TRUE,
    TRUE, TRUE, FALSE, FALSE, FALSE, FALSE
  ),
  Freq = c(8L, 8L, 6L, 8L, 92L, 92L, 94L, 92L),
  percent_weight = c(
    "8%", "8%",
    "6%", "8%", "92%", "92%", "94%", "92%"
  )
)
  • Related