I am trying to show how age (V1
) is correlated with a binary outcome (V2
), however, I am not having any luck with plotting this.
Here are my data:
> dput(head(test, 100))
structure(list(V1 = c(48, 92, 36, NA, 69, NA, NA, 19, 69, 82,
NA, 39, 42, NA, 68, 72, 27, 78, 42, 15, 79, 48, 38, 46, 17, 33,
24, 41, 68, 28, 79, NA, 52, 81, 74, 58, 57, 71, 51, 51, 51, 51,
31, 96, 47, NA, 66, 66, 73, 55, 79, 60, 60, 76, 34, 53, 58, 70,
80, 33, 17, 54, 42, 64, NA, 72, 53, 55, 59, NA, 68, 71, 70, 77,
16, 74, 74, 29, 49, NA, 64, 65, 65, 65, 57, 63, 60, 78, 77, 75,
54, 55, 97, NA, NA, 74, 80, 73, 74, 67), V2 = c(1, 0, 1, NA,
1, NA, NA, 1, 1, 1, NA, 0, 1, NA, 1, 1, 1, 1, 1, 1, 1, 1, 0,
1, 1, 1, 1, 0, 1, 1, 0, NA, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1,
1, 1, NA, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1,
1, NA, 1, 1, 1, 1, NA, 0, 1, 1, 1, 1, 1, 0, 1, 0, NA, 1, 1, 1,
1, 0, 0, 0, 1, 0, 1, 1, 0, 0, NA, NA, 0, 1, 0, 0, 0)), row.names = c(NA,
100L), class = "data.frame")
Here is what I attempted to do, but I am not getting any sort of smoothing curve to show how age is associated with the binary outcome:
ggplot(test, aes(x=V1, y=V2))
geom_point(size=2, alpha=0.4)
stat_smooth(method="loess", color="blue", size=1.5)
And this is what I am trying to create (although I am open to suggestions for betting plotting methods).
This is my output (haven't changed the axis labels, but the y-axis should be the binary outcome and the x-axis is age):
CodePudding user response:
If you have binary outcome data and a numeric predictor, the typical way to model this would be with logistic regression. You can show a logistic regression quite easily in ggplot by passing method = glm
and method.args = list(family = binomial))
to geom_smooth
.
You can augment this by adding the successes and failures as a sort of "rug plot", and adding a few aesthetic tweaks:
ggplot(test, aes(V1, V2))
geom_point(shape = "|", size = 6, na.rm = TRUE, aes(color = factor(V2)))
geom_smooth(method = glm, method.args = list(family = binomial), na.rm = TRUE,
formula = y ~ x, color = "navy", fill = "lightblue")
coord_cartesian(ylim = c(0, 1), expand = 0)
labs(x = "Age", y = "Probability")
theme_minimal(base_size = 16)
theme(axis.line = element_line(color = "gray"),
axis.ticks = element_line(color = "gray"),
axis.ticks.length = unit(3, "mm"),
legend.position = "none")
Note that this is preferable to a plain loess because with a loess (or other methods that do not explicitly account for the binary nature of the data) will give inaccurate confidence intervals (your target plot has a confidence interval which goes above 100% probability, which clearly doesn't make sense).
CodePudding user response:
Have you tried using the geom_smooth() function instead of stat_smooth()?