Home > Software design >  No line in ggplot when using glm in R
No line in ggplot when using glm in R

Time:09-19

I am trying to plot a logistic regression using ggplot with this type of binomial data (all data is more than 6000 rows):

age result
50 and older 1
18-49 years old 1
50 and older 0
50 and older 1
18-49 years old 0

Using geom_smooth I am trying to make a visualization of this logistic regression model:

ggplot(data, aes(age, result))   
   geom_smooth(method = "glm", formula = y ~ x, colour = "black", method.args = list(family = binomial))

However, the result looks like this:

enter image description here

CodePudding user response:

Because the x axis is discrete, you need to ensure that you give each x value the same group aesthetic:

library(ggplot2)

ggplot(data, aes(age, result, group = 1))   
  geom_smooth(method = "glm", formula = y ~ x, colour = "black", 
              method.args = list(family = binomial))

enter image description here


However, I'm not sure how meaningful this end result is, since your x axis groups are discrete, and it therefore doesn't make a lot of sense to have a continuous line or SE between them. If this were me, I would probably use point estimates with error bars:

pred_df <- data.frame(age = c('50 and older', '18-49 years old'))
fit <- predict(model, newdata = pred_df, se.fit = TRUE, type = 'response')
pred_df$fit <- fit$fit
pred_df$upper <- fit$fit   1.96 * fit$se.fit
pred_df$lower <- fit$fit - 1.96 * fit$se.fit

ggplot(pred_df, aes(age, fit))  
  geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.25)  
  geom_point(size = 3)  
  ylim(c(0, 1))

enter image description here

Data used

set.seed(1)
data <- data.frame(age = rep(c('50 and older', '18-49 years old'), each = 3000),
                   result = rbinom(6000, 1, rep(c(0.3, 0.5), each = 3000)))
  • Related