In a scatterplot, I would like to display both the correlation coefficient along an equation describing the relationship between x and y. I have created my datamaterial, here is my code so far:
library(tidyverse)
# Creation of datamaterial
salary <- c(95, 100, 105, 110, 120, 124, 135, 150, 165, 175, 225, 230, 235, 260)
height <- c(160, 150, 182, 165, 172, 175, 183, 187, 174, 193, 201, 172, 180, 188)
fakenumbers <- data.frame(salary, height)
cor(height, salary, method = c("pearson"))
# Creation of scatterplot
r <- ggplot(fakenumbers, aes(x = height, y = salary))
geom_point(size = 3, shape = 21, color = "black", fill = "blue")
labs(y = "Hourly salary
(sek)", x = "height (cm)", title = "Relationship between height and salary (made up data)")
theme_classic() theme(plot.title = element_text(hjust = 0.5, size = 18),
axis.title = element_text(size = 15),
axis.title.y = element_text(angle = 0, vjust = 0.5),
axis.text = element_text(size = 11))
# Adding a regressionline
r geom_smooth(method = lm, formula = y ~ x, se = FALSE)
Inside of the coordinate system, next to the regressionline, I would like an "r = 0.588" displayed and some equation describing the linear relationship. How can I accomplish this, using preferably ggplot(), or some other function?
CodePudding user response:
We could do it with ggpubr
package, adding stat_cor(p.accuracy = 0.001, r.accuracy = 0.01)
to your code:
library(ggpubr)
library(tidyverse)
r <- ggplot(fakenumbers, aes(x = height, y = salary))
geom_point(size = 3, shape = 21, color = "black", fill = "blue")
stat_cor(p.accuracy = 0.001, r.accuracy = 0.01)
labs(y = "Hourly salary
(sek)", x = "height (cm)", title = "Relationship between height and salary (made up data)")
theme_classic() theme(plot.title = element_text(hjust = 0.5, size = 18),
axis.title = element_text(size = 15),
axis.title.y = element_text(angle = 0, vjust = 0.5),
axis.text = element_text(size = 11))
CodePudding user response:
Here a base R way. Define a formula fo
, calculate regression, and define an eq
ation.
corr <- cor(height, salary, method = c("pearson"))
fo <- salary ~ height
fit <- lm(fo, fakenumbers)
(eq <- paste0(all.vars(fo)[1], ' ~ ', paste0(round(coef(fit), 2),
gsub('\\*\\(Intercept\\)', '',
paste0('*', names(coef(fit)))), collapse=' ')))
# [1] "salary ~ -281.58 salary ~ 2.49*height"
Then use variables in plot()
, abline()
, and text()
.
plot(fo, fakenumbers, pch=20, col=4,
xlab='height (cm)', ylab='Hourly salar (sec)',
main='Relationship between height and salary (made up data)')
abline(fit, col=4)
text(149, 250, bquote(italic('r=')~.(round(corr, 3))), adj=0, cex=.8)
text(149, 235, eq, adj=0, cex=.8)
Data:
fakenumbers <- structure(list(salary = c(95, 100, 105, 110, 120, 124, 135, 150,
165, 175, 225, 230, 235, 260), height = c(160, 150, 182, 165,
172, 175, 183, 187, 174, 193, 201, 172, 180, 188)), class = "data.frame", row.names = c(NA,
-14L))
CodePudding user response:
Another way:
round(cor(height, salary, method = c("pearson")), 4) -> corr
and then using geom_text
to display the correlation coefficient:
r
geom_smooth(method = lm, formula = y ~ x, se = FALSE)
geom_text(x = 152, y = 250,
label = paste0('r = ', corr),
color = 'red')