I am using ggscatter on R to plot a pearson correlation between two variables. However, when I color points, it appears that one reg.line is computed for each different colors. What I want to do is to color y points in the plot according to the column named 'mycolor' but I want the regression line to be computed on the whole data, regardless of the color.
Here is the function I use, with color or without color :
df < - structure(list(my_x = c(131L, 100L, NA, 125L, 50L, 50L, 16L,
3L, 27L, 96L, 176L, 121L, 129L, 84L, 67L, 35L, 36L, 18L, 29L,
29L, 26L, 25L, 24L, 20L, 28L, 22L, 25L, 15L, 0L, 18L, 13L, 17L,
14L, 23L, 27L, NA, 6L, 1L, 7L, 1L, 20L, 30L, 16L, 22L, 23L, 22L,
17L, 12L, 14L, 28L, 16L, 20L, 44L, 27L, 16L, 6L, 10L, 9L, 16L,
2L, 43L, 6L, 2L, 0L, 1L, 1L, 1L, 1L, 2L, 1L, 47L, 22L, 7L, 3L,
4L, 3L, 1L, 1L, 1L, 4L, 4L, 1L, 25L, 3L, 3L, 3L, 6L, 6L, 4L,
1L, 2L, 2L, 5L, 8L, 3L, 5L, 1L, 1L, 1L, 2L, 3L, 6L, 6L, 4L, 8L,
1L, 4L, 1L, 5L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 0L, 0L,
2L, 0L, 1L, 2L, 3L, 3L, 4L, 4L, 3L, 2L, 3L, 1L, 2L, 1L), my_y = c(134L,
90L, 130L, 134L, 44L, 48L, 17L, 4L, 19L, 97L, 178L, 39L, 132L,
90L, 35L, 35L, 36L, 18L, 28L, 14L, 25L, 26L, 24L, 18L, 25L, 22L,
9L, 15L, 0L, 21L, 6L, 15L, 15L, 21L, 27L, 19L, 7L, 0L, 8L, 2L,
10L, 30L, 19L, 23L, 12L, 23L, 16L, 6L, 14L, 29L, 15L, 12L, 21L,
14L, 11L, 7L, 5L, 4L, 16L, 5L, 36L, 5L, 2L, 0L, 1L, 1L, 1L, 1L,
2L, 1L, 50L, 22L, 7L, 3L, 6L, 3L, 1L, 1L, 1L, 4L, 4L, 1L, 21L,
3L, 3L, 3L, 6L, 7L, 4L, 1L, 2L, 2L, 1L, 6L, 3L, 2L, 1L, 1L, 2L,
2L, 3L, 2L, 6L, 7L, 6L, 1L, 4L, 1L, 5L, 2L, 1L, 2L, 2L, 2L, 2L,
1L, 2L, 2L, 1L, 0L, 0L, 2L, 0L, 1L, 2L, 3L, 2L, 4L, 4L, 3L, 2L,
3L, 1L, 2L, 1L), mycolor = c("color1", "color1", "color1",
"color1", "color1", "color1", "color1", "color1", "color1",
"color1", "color1", "color1", "color1", "color1", "color1",
"color2", "color2", "color2", "color2", "color2", "color2",
"color2", "color2", "color2", "color2", "color2", "color2",
"color2", "color2", "color2", "color2", "color2", "color2",
"color2", "color2", "color2", "color2", "color7",
"Turtle", "Turtle", "color2", "color2", "color2", "color2",
"color2", "color2", "color2", "color2", "color2", "color2",
"color2", "color2", "color2", "color2", "color2", "color2",
"color2", "color2", "color2", "color3", "color4",
"color4", "color4", "color4", "color4",
"color4", "color4", "color4", "color4",
"color4", "color4", "color4", "color5",
"color5", "color5", "color5", "color5",
"color5", "color5", "color5", "color5",
"color5", "color5", "color5", "color5",
"color5", "color5", "color6", "color6", "color6", "color6",
"color6", "color6", "color6", "color6", "color6", "color6", "color6", "color6",
"color6", "color6", "color6", "color6", "color6", "color6", "color6", "color6",
"color6", "color6", "color6", "color6", "color6", "color6", "color6", "color6",
"color6", "color6", "color6", "color6", "color6", "color6", "color6", "color6",
"color6", "color6", "color6", "color6", "color6", "color6", "color6", "color6",
"color6", "color6", "color6", "color6")), class = "data.frame", row.names = c(NA,
-135L))
df %>%
ggscatter(., y="my_y", x="my_x",
color="mycolor",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "pearson")
df %>%
ggscatter(., y="my_y", x="my_x",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "pearson")
The two results :
Taking the example above, I basically want to have the plot on the left but replacing the regression lines with the regression line of the right plot
Is there anyway to do this with ggscatter or should I use ggplot2 geom_point and add the regression line myself ?
Thanks for any help !
Maxime
CodePudding user response:
IMHO the easiest appraoch would be to add the regression line manually using geom_smooth
.
Using mtcars
as example data:
library(ggpubr)
#> Loading required package: ggplot2
mtcars %>%
mutate(cyl = factor(cyl)) %>%
ggscatter(., y="hp", x="mpg",
color="cyl",
cor.coef = TRUE, cor.method = "pearson")
geom_smooth(method = "lm", color = "black")
#> `geom_smooth()` using formula 'y ~ x'