Home > front end >  Add a regression line to ggscatter plot but ignore grouping
Add a regression line to ggscatter plot but ignore grouping

Time:08-20

I am using ggscatter on R to plot a pearson correlation between two variables. However, when I color points, it appears that one reg.line is computed for each different colors. What I want to do is to color y points in the plot according to the column named 'mycolor' but I want the regression line to be computed on the whole data, regardless of the color.

Here is the function I use, with color or without color :

df < - structure(list(my_x = c(131L, 100L, NA, 125L, 50L, 50L, 16L, 
3L, 27L, 96L, 176L, 121L, 129L, 84L, 67L, 35L, 36L, 18L, 29L, 
29L, 26L, 25L, 24L, 20L, 28L, 22L, 25L, 15L, 0L, 18L, 13L, 17L, 
14L, 23L, 27L, NA, 6L, 1L, 7L, 1L, 20L, 30L, 16L, 22L, 23L, 22L, 
17L, 12L, 14L, 28L, 16L, 20L, 44L, 27L, 16L, 6L, 10L, 9L, 16L, 
2L, 43L, 6L, 2L, 0L, 1L, 1L, 1L, 1L, 2L, 1L, 47L, 22L, 7L, 3L, 
4L, 3L, 1L, 1L, 1L, 4L, 4L, 1L, 25L, 3L, 3L, 3L, 6L, 6L, 4L, 
1L, 2L, 2L, 5L, 8L, 3L, 5L, 1L, 1L, 1L, 2L, 3L, 6L, 6L, 4L, 8L, 
1L, 4L, 1L, 5L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 0L, 0L, 
2L, 0L, 1L, 2L, 3L, 3L, 4L, 4L, 3L, 2L, 3L, 1L, 2L, 1L), my_y = c(134L, 
90L, 130L, 134L, 44L, 48L, 17L, 4L, 19L, 97L, 178L, 39L, 132L, 
90L, 35L, 35L, 36L, 18L, 28L, 14L, 25L, 26L, 24L, 18L, 25L, 22L, 
9L, 15L, 0L, 21L, 6L, 15L, 15L, 21L, 27L, 19L, 7L, 0L, 8L, 2L, 
10L, 30L, 19L, 23L, 12L, 23L, 16L, 6L, 14L, 29L, 15L, 12L, 21L, 
14L, 11L, 7L, 5L, 4L, 16L, 5L, 36L, 5L, 2L, 0L, 1L, 1L, 1L, 1L, 
2L, 1L, 50L, 22L, 7L, 3L, 6L, 3L, 1L, 1L, 1L, 4L, 4L, 1L, 21L, 
3L, 3L, 3L, 6L, 7L, 4L, 1L, 2L, 2L, 1L, 6L, 3L, 2L, 1L, 1L, 2L, 
2L, 3L, 2L, 6L, 7L, 6L, 1L, 4L, 1L, 5L, 2L, 1L, 2L, 2L, 2L, 2L, 
1L, 2L, 2L, 1L, 0L, 0L, 2L, 0L, 1L, 2L, 3L, 2L, 4L, 4L, 3L, 2L, 
3L, 1L, 2L, 1L), mycolor = c("color1", "color1", "color1", 
"color1", "color1", "color1", "color1", "color1", "color1", 
"color1", "color1", "color1", "color1", "color1", "color1", 
"color2", "color2", "color2", "color2", "color2", "color2", 
"color2", "color2", "color2", "color2", "color2", "color2", 
"color2", "color2", "color2", "color2", "color2", "color2", 
"color2", "color2", "color2", "color2", "color7", 
"Turtle", "Turtle", "color2", "color2", "color2", "color2", 
"color2", "color2", "color2", "color2", "color2", "color2", 
"color2", "color2", "color2", "color2", "color2", "color2", 
"color2", "color2", "color2", "color3", "color4", 
"color4", "color4", "color4", "color4", 
"color4", "color4", "color4", "color4", 
"color4", "color4", "color4", "color5", 
"color5", "color5", "color5", "color5", 
"color5", "color5", "color5", "color5", 
"color5", "color5", "color5", "color5", 
"color5", "color5", "color6", "color6", "color6", "color6", 
"color6", "color6", "color6", "color6", "color6", "color6", "color6", "color6", 
"color6", "color6", "color6", "color6", "color6", "color6", "color6", "color6", 
"color6", "color6", "color6", "color6", "color6", "color6", "color6", "color6", 
"color6", "color6", "color6", "color6", "color6", "color6", "color6", "color6", 
"color6", "color6", "color6", "color6", "color6", "color6", "color6", "color6", 
"color6", "color6", "color6", "color6")), class = "data.frame", row.names = c(NA, 
-135L))
df %>%
  ggscatter(., y="my_y", x="my_x",
            color="mycolor",
            add = "reg.line", conf.int = TRUE, 
            cor.coef = TRUE, cor.method = "pearson")


df %>%
  ggscatter(., y="my_y", x="my_x",
            add = "reg.line", conf.int = TRUE, 
            cor.coef = TRUE, cor.method = "pearson")

The two results :

enter image description here

Taking the example above, I basically want to have the plot on the left but replacing the regression lines with the regression line of the right plot

Is there anyway to do this with ggscatter or should I use ggplot2 geom_point and add the regression line myself ?

Thanks for any help !

Maxime

CodePudding user response:

IMHO the easiest appraoch would be to add the regression line manually using geom_smooth.

Using mtcars as example data:

library(ggpubr)
#> Loading required package: ggplot2

mtcars %>%
  mutate(cyl = factor(cyl)) %>%
  ggscatter(., y="hp", x="mpg",
            color="cyl",
            cor.coef = TRUE, cor.method = "pearson")  
  geom_smooth(method = "lm", color = "black")
#> `geom_smooth()` using formula 'y ~ x'

  • Related