I am doing the 5 day regression challenge on Kaggle but I am having trouble getting a poisson regression line to plot when using ggplot. I want to look at the effects of Rain fall on the number of bikes that cross the bridges in NYC.
I have tried reversing the order of geom_point and geom_smooth but that hasn't worked. If I use gaussian instead, that also doesn't work.
Here's what I have written:
ggplot(data = bikes, mapping = aes(x = Precipitation, y = Total))
geom_smooth(method = "glm", method.args = list(family = "poisson"))
geom_point()
labs(title = 'Precipitation vs. Total Bike Crossings')
CodePudding user response:
Inspect the data before plotting it:
unique(bikes$Precipitation)
# [1] "0.01" "0.15" "0.09" "0.47 (S)" "0" "0.2" "T"
# [8] "0.16" "0.24" "0.05"
The regressor Precipitation
is not numeric and some of its values when coerced to numeric will become NA
values.
If you coerce Precipitation
first the regression line shows up.
library(dplyr)
library(ggplot2)
bikes %>%
mutate(Precipitation = as.numeric(Precipitation)) %>%
na.omit() %>%
ggplot(mapping = aes(x = Precipitation, y = Total))
geom_smooth(method = "glm",
formula = y ~ x,
method.args = list(family = "poisson"))
geom_point()
labs(title = 'Precipitation vs. Total Bike Crossings')