Home > Software design >  geom_smooth with geom_label_repel creates unwanted multiple labels
geom_smooth with geom_label_repel creates unwanted multiple labels

Time:07-13

I start off without including the data. The problem comes from using geom_smooth with lots of data points (i.e. a large data set), so a minimal data example for illustration purposes seems difficult to use (I tried). But I can submit the data if requested.

I have scores on several variables and want to see trends in these scores across the age of respondents (cross-sectional data). Data are now in long format (so the original variables are all under the column 'name').

Like this:

     age name     value
   <dbl> <chr>    <dbl>
 1    40 mo_clean     1
 2    40 mo_groc      3
 3    40 mo_trans     1
 4    40 mo_digi      3
 5    40 mo_emo       3
 6    40 mo_activ     1
 7    40 mo_supv      1
 8    40 mo_doct      1
 9    39 mo_clean     1
10    39 mo_groc      1
# … with 42,030 more rows

I want to:

  • use geom_smooth and geom_label and
  • then switch to ggrepel::geom_label_repel to avoid overlapping labels

Getting labels to work with geom_smooth turned out difficult, but I managed to do so with the code below:

library(ggplot2)
library(ggrepel)

df %>%
  {
    ggplot(df, aes(age, value, label = name, color = name))  
      geom_smooth(se = FALSE)  
      guides(color = "none")  
      geom_label(
        data = group_by(., name) %>%
          do(augment(loess(value ~ age, .))) %>%
          filter(age == max(age)),
        aes(age, .fitted), nudge_x = 2
      )
  }  
  scale_x_continuous(breaks = seq(35, 65, by = 5))  
  xlab("Age")  
  ylab(" ")  
  theme(text = element_text(size = 14))

which gives this result:

Result prior to trying ggreep

Now, as anticipated, substituting geom_label with geom_rabel_repel does not work, due to the many data points. I get the following error message:

`geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
Warning message:
ggrepel: 720 unlabeled data points (too many overlaps). Consider increasing max.overlaps 

and all labels in the figure are dropped.

Increasing max.overlaps is not the way to go, I assume. Just to illustrate the extreme case, with max.overlaps = Inf:

[...]
      geom_label_repel(
        data = group_by(., name) %>%
          do(augment(loess(value ~ age, .))) %>%
          filter(age == max(age)),
        aes(age, .fitted), 
        max.overlaps = Inf
      )
[...]

max.overlaps=Inf

Any hint? For instance where to find help (or even code suggestions)? Lots of web searches have not given me what I'm looking for: how to combine geom_smooth with geom_label_repel to get a nice plot with each smoothed line labelled, without labels overlapping.

—-

My question refers to geom_smooth with lots of data points, the linked question (One solution

So, the error message leads down the wrong road in this case.


Concerning tjebo's suggestion in an alternative answer: I believe adding stat=unique is a better solution than the second chunk in tjebo's answer (although the resulting figure is still not what I want).

library(tidyverse)
library(ggrepel)

df <- 
  diamonds %>% select(age = table, name = color, value = price)
df %>%
  {
    ggplot(df, aes(age, value, label = name, color = name))  
      geom_smooth(se = FALSE)  
      geom_label_repel(
        data = group_by(., name) %>%
          do(broom::augment(loess(value ~ age, .))) %>%
          filter(age == max(age)),
        aes(age, .fitted),
        stat = "unique"
      )
  } 
#> `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Better, but not final solution

CodePudding user response:

I am answering because I'd like to demonstrate a way to reproduce the problem. There is hardly any problem that cannot be reproduced with one of the inbuilt data sets. For your problem, you could use the diamonds data set. It contains a similar amount of rows and has similar columns (integers and character).

library(tidyverse)
library(ggrepel)

df <- 
  diamonds %>% select(age = table, name = color, value = price)
df %>%
  {
    ggplot(df, aes(age, value, label = name, color = name))  
      geom_smooth(se = FALSE)  
      geom_label_repel(
        data = group_by(., name) %>%
          do(broom::augment(loess(value ~ age, .))) %>%
          filter(age == max(age)),
        aes(age, .fitted)
      )
  } 
#> `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Here, on a small scale, you can see repeated labels - that's the core of the problem, and we have reproduced your problem.

Two answers suggest how to label geom_smooth derived curves. Here an adaptation of my own suggestion, using the geom_textpath package. I admit the labels look awkwardly rotated in that case, but I guess with less awkward lines such as in your example it should look nicer.


## fix this using one of the suggested solutions in thread 
## https://stackoverflow.com/questions/29357612/plot-labels-at-ends-of-lines?noredirect=1&lq=1
library(geomtextpath)
ggplot(df, aes(age, value, label = name, color = name))  
  ## note you currently have to specify method argument, otherwise the disambiguation of some function fails. 
  ## see also https://github.com/AllanCameron/geomtextpath/issues/79)  
  geom_labelsmooth(hjust = 1, method = "loess") 
#> `geom_smooth()` using formula 'y ~ x'

Created on 2022-07-12 by the reprex package (v2.0.1)

  • Related