I am trying to add a polynomial best fit line but for some reason, it appears to be drawn back and forth on the plot. I was wondering what I should do to fix the line
yhat_pol = lm(age~poly(height,2), data = Loblolly)
yhat_lin = lm(age~height, data = Loblolly)
plot(x=Loblolly$height, y=Loblolly$age, pch=16, xlab = "Height (ft)", ylab = "Age (yr)", main = "Height vs Age of Loblloy Pine Trees")
lines(Loblolly$height, predict(yhat_pol), type="l", col="green", lwd=2)
lines(Loblolly$height, predict(yhat_lin), type="l", col="red", lwd=2)
lines(sort(Loblolly$height), fitted(yhat_lin)[order(Loblolly$height)])
legend("topleft",
legend = c("linear","polynomial"),
col = c("red","green"),
lty = 1, lwd=3)
height | age | Seed | |
---|---|---|---|
1 | 4.51 | 3 | 301 |
15 | 10.89 | 5 | 301 |
29 | 28.72 | 10 | 301 |
43 | 41.74 | 15 | 301 |
57 | 52.70 | 20 | 301 |
71 | 60.92 | 25 | 301 |
2 | 4.55 | 3 | 303 |
16 | 10.92 | 5 | 303 |
30 | 29.07 | 10 | 303 |
44 | 42.83 | 15 | 303 |
CodePudding user response:
You almost had it.
lines(sort(Loblolly$height), predict(yhat_pol)[order(Loblolly$height)], type="l", col="green", lwd=2)
I don't think the order of the data causes these problems if you use ggplot2.
CodePudding user response:
You didn't ask for it, but here is - as pointed out by Brian Montgomery - a solution using ggplot2
:
library(dplyr)
library(tidyr)
library(ggplot2)
Loblolly %>%
mutate(age_pred_pol = predict(yhat_pol),
age_pred_lin = predict(yhat_lin)) %>%
ggplot(aes(x = height, y = age))
geom_point()
geom_line(aes(y = age_pred_pol, color = "Polynomial Best Fit"))
geom_line(aes(y = age_pred_lin, color = "Linear Best Fit"))
labs(x = "Height (ft)", y = "Age (yr)", title = "Height vs Age of Loblloy Pine Trees")
scale_colour_manual(name = "Prediction",
breaks = c("Polynomial Best Fit", "Linear Best Fit"),
values = c("red", "green"))
theme_bw()
This returns