Home > database >  Unable to plot binary outcome and continuous predictor?
Unable to plot binary outcome and continuous predictor?

Time:10-15

I am trying to show how age (V1) is correlated with a binary outcome (V2), however, I am not having any luck with plotting this.

Here are my data:

> dput(head(test, 100))
structure(list(V1 = c(48, 92, 36, NA, 69, NA, NA, 19, 69, 82, 
NA, 39, 42, NA, 68, 72, 27, 78, 42, 15, 79, 48, 38, 46, 17, 33, 
24, 41, 68, 28, 79, NA, 52, 81, 74, 58, 57, 71, 51, 51, 51, 51, 
31, 96, 47, NA, 66, 66, 73, 55, 79, 60, 60, 76, 34, 53, 58, 70, 
80, 33, 17, 54, 42, 64, NA, 72, 53, 55, 59, NA, 68, 71, 70, 77, 
16, 74, 74, 29, 49, NA, 64, 65, 65, 65, 57, 63, 60, 78, 77, 75, 
54, 55, 97, NA, NA, 74, 80, 73, 74, 67), V2 = c(1, 0, 1, NA, 
1, NA, NA, 1, 1, 1, NA, 0, 1, NA, 1, 1, 1, 1, 1, 1, 1, 1, 0, 
1, 1, 1, 1, 0, 1, 1, 0, NA, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 
1, 1, NA, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 
1, NA, 1, 1, 1, 1, NA, 0, 1, 1, 1, 1, 1, 0, 1, 0, NA, 1, 1, 1, 
1, 0, 0, 0, 1, 0, 1, 1, 0, 0, NA, NA, 0, 1, 0, 0, 0)), row.names = c(NA, 
100L), class = "data.frame")

Here is what I attempted to do, but I am not getting any sort of smoothing curve to show how age is associated with the binary outcome:

ggplot(test, aes(x=V1, y=V2)) 
  geom_point(size=2, alpha=0.4) 
  stat_smooth(method="loess", color="blue", size=1.5)

And this is what I am trying to create (although I am open to suggestions for betting plotting methods). Ideal Output

This is my output (haven't changed the axis labels, but the y-axis should be the binary outcome and the x-axis is age): Output

CodePudding user response:

If you have binary outcome data and a numeric predictor, the typical way to model this would be with logistic regression. You can show a logistic regression quite easily in ggplot by passing method = glm and method.args = list(family = binomial)) to geom_smooth.

You can augment this by adding the successes and failures as a sort of "rug plot", and adding a few aesthetic tweaks:

ggplot(test, aes(V1, V2))  
  geom_point(shape = "|", size = 6, na.rm = TRUE, aes(color = factor(V2)))  
  geom_smooth(method = glm, method.args = list(family = binomial), na.rm = TRUE, 
              formula = y ~ x, color = "navy", fill = "lightblue")  
  coord_cartesian(ylim = c(0, 1), expand = 0)  
  labs(x = "Age", y = "Probability")  
  theme_minimal(base_size = 16)  
  theme(axis.line = element_line(color = "gray"),
        axis.ticks = element_line(color = "gray"),
        axis.ticks.length = unit(3, "mm"),
        legend.position = "none")

enter image description here

Note that this is preferable to a plain loess because with a loess (or other methods that do not explicitly account for the binary nature of the data) will give inaccurate confidence intervals (your target plot has a confidence interval which goes above 100% probability, which clearly doesn't make sense).

CodePudding user response:

Have you tried using the geom_smooth() function instead of stat_smooth()?

  • Related