Home > database >  Force geom_smooth() to plot regression line from origin to one set of points (R, ggplot, geom_smooth
Force geom_smooth() to plot regression line from origin to one set of points (R, ggplot, geom_smooth

Time:03-25

I have a very specific situation where I am trying to get a regression line to start at the origin and fit to one set of points on the x axis. Ideally I wouldn't have to do this but the line from the origin to these points is actually meaningful in my case.

I have come up with a simple example

library(dplyr)
library(ggplot2)
y<-c(1,2,3,4,5,6,7,8)
x<-c(3,3,3,3,3,3,3,3)
z<-as.data.frame(cbind(x,y))
z %>% ggplot(aes(x,y))   geom_point()   
  geom_smooth(formula = y ~ x   0, method = "lm")  
  theme_bw()   expand_limits(x = 0, y = 0)    
  theme(aspect.ratio = 1)

enter image description here

Here, geom smooth will not fit a line from the origin to the points at x = 3. I'm assuming that there is some background script telling geom smooth to not plot a line where no variation in x axis exists. I've tested this somewhat and by changing one of the x values to 0 I can indeed get a line from the origin (though the y value I choose influences the confidence interval which is not ideal).

y<-c(1,2,3,4,5,6,7,8)
x<-c(3,3,3,3,3,3,3,0)
z<-as.data.frame(cbind(x,y))
z %>% ggplot(aes(x,y))   geom_point()   
  geom_smooth(formula = y ~ x   0, method = "lm")  
  theme_bw()   expand_limits(x = 0, y = 0)   
  theme(aspect.ratio = 1)

enter image description here

I don't want to fiddle with the dataset and add a point at y = 0, x = 0 as I'm worried about that influencing some error estimate (however small). I'm assuming that there is some condition that I can set within geom smooth or some other command to force the line to fit. Any help is appreciated, thanks

CodePudding user response:

Remember that linear regression just tells you the conditional mean of y for a given x. The "regression" at x = 3 is simply the best estimate of the mean of y at x = 3. Since all of your points are at x = 3, the conditional mean of y when x = 3 is just mean(y)

So all you need should be a line going from (0, 0) to (0, mean(y)). It really doesn't make any sense to have a standard error around this line, though perhaps it might be justified depending on the context.

library(dplyr)
library(ggplot2)
y<-c(1,2,3,4,5,6,7,8)
x<-c(3,3,3,3,3,3,3,3)
z<-as.data.frame(cbind(x,y))
z %>% ggplot(aes(x,y))   geom_point()   
  geom_smooth(formula = y ~ x   0, method = "lm")  
  theme_bw()   expand_limits(x = 0, y = 0)    
  theme(aspect.ratio = 1)  
  geom_line(data = data.frame(x = c(0, 3), y = c(0, mean(y))))

enter image description here

CodePudding user response:

Kind of silly but workable solution that I have figured out.

If I add an incredibly small amount of random variation to values in the x axis and specify fullrange = TRUE within geom_smooth then I can get the line to fit with an error estimate.

y<-c(1,2,3,4,5,6,7,8)
x<-c(3,3,3,3,3,3,3,3)
z<-as.data.frame(cbind(x,y))
z %>% mutate(rand = rnorm(8, mean=0.0000000001, sd=0.000000000001), x = x   rand) %>% 
  ggplot(aes(x,y))   geom_point()   
  geom_smooth(formula = y ~ x   0, fullrange = TRUE ,method = "lm")  
  theme_bw()   expand_limits(x = 0, y = 0)  
  theme(aspect.ratio = 1)
  • Related