Home > front end >  How do I display a correlation coefficient in a scatterplot?
How do I display a correlation coefficient in a scatterplot?

Time:12-27

In a scatterplot, I would like to display both the correlation coefficient along an equation describing the relationship between x and y. I have created my datamaterial, here is my code so far:

library(tidyverse)

# Creation of datamaterial

salary <- c(95, 100, 105, 110, 120, 124, 135, 150, 165, 175, 225, 230, 235, 260)
height <- c(160, 150, 182, 165, 172, 175, 183, 187, 174, 193, 201, 172, 180, 188)
fakenumbers <- data.frame(salary, height)

cor(height, salary, method = c("pearson"))

# Creation of scatterplot

r <- ggplot(fakenumbers, aes(x = height, y = salary))   
  geom_point(size = 3, shape = 21, color = "black", fill = "blue")   
  labs(y = "Hourly salary 
       (sek)", x = "height (cm)", title = "Relationship between height and salary (made up data)")   
  theme_classic()   theme(plot.title = element_text(hjust = 0.5, size = 18), 
                          axis.title = element_text(size = 15), 
                          axis.title.y = element_text(angle = 0, vjust = 0.5), 
                          axis.text = element_text(size = 11))

# Adding a regressionline

r   geom_smooth(method = lm, formula = y ~ x, se = FALSE)

Inside of the coordinate system, next to the regressionline, I would like an "r = 0.588" displayed and some equation describing the linear relationship. How can I accomplish this, using preferably ggplot(), or some other function?

CodePudding user response:

We could do it with ggpubr package, adding stat_cor(p.accuracy = 0.001, r.accuracy = 0.01) to your code:

library(ggpubr)
library(tidyverse)

r <- ggplot(fakenumbers, aes(x = height, y = salary))   
  geom_point(size = 3, shape = 21, color = "black", fill = "blue")   
  stat_cor(p.accuracy = 0.001, r.accuracy = 0.01) 
  labs(y = "Hourly salary 
       (sek)", x = "height (cm)", title = "Relationship between height and salary (made up data)")   
  theme_classic()   theme(plot.title = element_text(hjust = 0.5, size = 18), 
                          axis.title = element_text(size = 15), 
                          axis.title.y = element_text(angle = 0, vjust = 0.5), 
                          axis.text = element_text(size = 11))

enter image description here

CodePudding user response:

Here a base R way. Define a formula fo, calculate regression, and define an eqation.

corr <- cor(height, salary, method = c("pearson"))

fo <- salary ~ height
fit <- lm(fo, fakenumbers)
(eq <- paste0(all.vars(fo)[1], ' ~ ', paste0(round(coef(fit), 2),
              gsub('\\*\\(Intercept\\)', '', 
                   paste0('*', names(coef(fit)))), collapse='   ')))
# [1] "salary ~ -281.58   salary ~ 2.49*height"

Then use variables in plot(), abline(), and text().

plot(fo, fakenumbers, pch=20, col=4,
     xlab='height (cm)', ylab='Hourly salar (sec)',
     main='Relationship between height and salary (made up data)')
abline(fit, col=4)
text(149, 250, bquote(italic('r=')~.(round(corr, 3))), adj=0, cex=.8)
text(149, 235, eq, adj=0, cex=.8)

enter image description here


Data:

fakenumbers <- structure(list(salary = c(95, 100, 105, 110, 120, 124, 135, 150, 
165, 175, 225, 230, 235, 260), height = c(160, 150, 182, 165, 
172, 175, 183, 187, 174, 193, 201, 172, 180, 188)), class = "data.frame", row.names = c(NA, 
-14L))

CodePudding user response:

Another way:

round(cor(height, salary, method = c("pearson")), 4) -> corr

and then using geom_text to display the correlation coefficient:

r  
  geom_smooth(method = lm, formula = y ~ x, se = FALSE)  
  geom_text(x = 152, y = 250,
            label = paste0('r = ', corr),
            color = 'red')

enter image description here

  • Related