Home > Enterprise >  Using `scale_y_discrete` to include variables in label names reorders labels without reordering the
Using `scale_y_discrete` to include variables in label names reorders labels without reordering the

Time:10-23

Assume I have the following data:

library(tidyverse)

set.seed(123)

col <- tibble(
  name = toupper(letters[1:20]),
  share = round(rnorm(20, 0.5, 0.1), 2),
  active_days = sample.int(50, 20)
)

Plotting this as per the code above without using scale_y_discrete(), we get:

col %>%
  ggplot(aes(y = reorder(name, share), x = share))  
  geom_col()  
  labs(x = "Share",
       y = "Merchant")

enter image description here

I want to add the value of active_days to each merchant's label in the plot. This can be achieved using scale_y_discrete:

col %>%
  ggplot(aes(y = reorder(name, share), x = share))  
  geom_col()  
  labs(x = "Share",
       y = "Merchant")  
  scale_y_discrete(labels = paste0(col$name, " (", col$active_days, ")"))

![](upload://8Zh4eZa3IXiyJBlzjikxU2t4cQI.png)

However, adding scale_y_discrete changes the order of the labels to reverse alphabetical order, but the values/bars are displayed correctly in descending order, giving the impression, in this case, that merchant T has the highest share, while it is in fact merchant P! This is obviously highly unwanted. Anyone got any idea what is happening here, and how it can be remedied?

CodePudding user response:

Due to the reordering, the order in the original data.frame is out of sync with what the discrete scale perceives to be the order. If you don't have duplicated names in the data, you can match the values of the axis label to the original data to lookup the corresponding value.

library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.1.1
#> Warning: package 'tidyr' was built under R version 4.1.1
#> Warning: package 'readr' was built under R version 4.1.1

set.seed(123)

col <- tibble(
  name = toupper(letters[1:20]),
  share = round(rnorm(20, 0.5, 0.1), 2),
  active_days = sample.int(50, 20)
)

col %>%
  ggplot(aes(y = reorder(name, share), x = share))  
  geom_col()  
  labs(x = "Share",
       y = "Merchant")  
  scale_y_discrete(
    labels = ~ paste0(
      .x, " (", col$active_days[match(.x, col$name)], ")"
    )
  )

Created on 2021-10-22 by the reprex package (v2.0.1)

CodePudding user response:

The answer from @teunbrand is exactly correct.

If your situation allows, it might be easier to just set the factor levels before plotting.

library(tidyverse)

set.seed(123)

col <- tibble(
  name = toupper(letters[1:20]),
  share = round(rnorm(20, 0.5, 0.1), 2),
  active_days = sample.int(50, 20)
)

col %>%
  mutate(breaks = paste0(name, "(", active_days, ")" )) %>% 
  mutate(breaks = fct_reorder(breaks, share)) %>% 
  ggplot(aes(y = breaks, x = share))  
  geom_col()  
  labs(x = "Share",
       y = "Merchant")

Created on 2021-10-22 by the reprex package (v2.0.1)

  • Related