Home > Blockchain >  How do I add a line of best fit for scatterplot with multiple variables in ggplot
How do I add a line of best fit for scatterplot with multiple variables in ggplot

Time:06-21

My data looks like this:

Male_0602 <- rnorm(N)
Female_0602 <- rnorm(N)
tce <- rnorm(N)

data <- data.frame(Male_0602, Female_0602, tce)

Currently, my code looks like this:

ggplot(data, aes(x = tce*100))   
  geom_point(aes(y = Male_0602, colour = "Male"))  
  geom_point(aes(y = Female_0602, colour = "Female"))  
  labs(x = "Province-level tariff cut",
       y = "Change in province-level wage between 2001 and 2005")

My scatterplot currently looks like this:

enter image description here

I would like to add a separate line of best fit for Male and Female. Can you please tell me how to do that? Thank you.

CodePudding user response:

Convert the data frame into long format and add geom_smooth().

library(tidyr)

data %>% 
  pivot_longer(cols = c(Male_0602, Female_0602), names_to = "gender") %>% 
  ggplot(aes(x = tce*100, y = value, color = gender))  
  geom_point()  
  geom_smooth(method = "lm", se = FALSE)  
  labs(x = "Province-level tariff cut",
       y = "Change in province-level wage between 2001 and 2005")

enter image description here

CodePudding user response:

The following code converts the data frame into a longer format so that the variables can be plotted on one geom, them use geom_smooth to add the best lit lines.

   data %>% 
      pivot_longer(cols = c(Male_0602, Female_0602), names_to = "cats") %>% 
      ggplot(aes(x = tce*100, y = value, colour = as_factor(cats))) 
      geom_point() 
      geom_smooth(method = "lm", se = FALSE)

enter image description here

  • Related