What factors effect how ggplot legends are ordered-CodePudding

I am creating a scatter plot in R where users can add or remove a horizontal line which shows a fixed reference value. In doing so, I noticed that changing the name of the reference line re-orders the legend, so that sometimes the horizontal line appears before the scatter legend elements, and sometimes afterwards.

Compare:

Below is a reproduceable example


YEAR = as.integer(rep(2010:2020,5))
SERIES_NAME = rep(LETTERS[1:5], each = 11)
OBS_VALUE = runif(n = 55, min = -5, max = 20)
EA = ifelse(SERIES_NAME=='A', 'Option 1', 'Option 2')

df <- data.frame(YEAR=YEAR,
            SERIES_NAME=SERIES_NAME,
            OBS_VALUE=OBS_VALUE,
            EA=EA)

Comment out one or the other of the two lines to produce a graph where to name the horizontal line on the legend.

aaaaaaa <- "RGCUZYMSFP"  # appears above
aaaaaaa <- "IZTCYUXGBO"  # appears below

The graphs are then produced:


df %>% 
  select(YEAR,SERIES_NAME, OBS_VALUE, EA) %>%
  ggplot()   
  ggplot2::geom_point(
    ggplot2::aes(
      x = YEAR,
      y = OBS_VALUE,
      col = EA),
    size = 2)  
  ggplot2::guides(
    color = ggplot2::guide_legend(nrow = 2, 
                                  byrow = TRUE))  
  scale_linetype_manual(values = 2)  
  scale_x_continuous(breaks = seq(2010,2020,5)) 
  geom_hline(aes(yintercept = EAMean,
                 linetype = aaaaaaa),
             size = 1, color = "black")

I've also noticed that changing the name of the variable changes the output. If I change the name of the variable from aaaaaaaa (8 times the letter a) to aaaaaaa (7 times the letter a) and update the code for the horizontal line accordingly, the legend re-orders

Is there a way that I can be more consistent in controlling where my legend item goes?

CodePudding user response：

As rightfully pointed out by @stefan, it depends on a 'secret algorithm'. While the outcome of the algorithm is unpredictable if the 'order' isn't set, we can reverse engineer what is does.

In these lines of the guide_legend() methods we see that a 'hash' is constructed that is (relatively) unique for a guide's title, labels, direction and name. The unpredictability here is in the hashing that can create widely different hashes for relatively similar (but not identical) input.

Later, in the guides_merge() internal function, we can see that guides (or guide defitions) are splitted on their hash. Since the hashing produces the same outcome for the same input, this instructs ggplot2 whether these guides are 'mergeable' due to shared title, labels, direction and name.

What the 'order' argument does, is that the 'order' is pasted in front of the hash, so that in lexicographical order guides with order = 0 precede guides with order = 1. Guides with no order argument set get 99 as order. Because splitting returns results in a sorted order depending on the split levels, this effectively sorts the output on order first, hash second.