Home > Enterprise >  Why is stats::loess and geom_smooth(method = "loess") different?
Why is stats::loess and geom_smooth(method = "loess") different?

Time:04-20

geom_smooth() (RED) appears to be more "smooth" when plotted in ggplot2 than if I plot the values of stats::loess with geom_line() (BLUE).

Why? And how do you make the geom_line() like the line produced by geom_smooth()?

Reprex:

# Data
data <- structure(list(date_int = c(0.834136630343671, 0.848910310142498, 
    0.851948868398994, 0.857082984073764, 0.866093880972339, 0.86955155071249, 
    0.874895222129086, 0.925660100586756, 0.937709555741827, 0.957355406538139, 
    0.977525146689019, 0.996070829840738, 0.998428331936295, 0.998428331936295, 
    0.998480720871752, 0.998795054484493, 0.999161777032691, 0.999528499580889, 
    0.999895222129086, 1, 1), value = c(51.78, 46.2, 44.01, 41.1, 
    39.1, 38.19, 42.87, 42.47, 37.22, 41.6, 44.7, 39.7, 23, 28.7, 
    23, 30.9, 35.4, 35.8, 32.4, 31, 31)), row.names = c(NA, -21L), class = c("tbl_df", 
    "tbl", "data.frame"))

# Add manually added loess values
data <- data %>%
  mutate(pred_loess = stats::loess(value ~ date_int, method = "loess")$fitted)

# Plot red and blue
ggplot(data,
       aes(x = date_int,
           y = value))  
  geom_point()  
  geom_smooth(colour = "red", size = 1, se = FALSE)  
  geom_line(aes(y = pred_loess), colour = "blue", size = 1, se = FALSE)  
  labs(title = "RED (geom_smooth) is smoother\nthan BLUE (geom_line)")

enter image description here

CodePudding user response:

To manually plot the loess line, make a new dataframe with regularly spaced x-values and use the predict() function to find the values for the y-variable.

library(dplyr)
library(ggplot2)

# Data
data <- structure(list(date_int = c(0.834136630343671, 0.848910310142498, 
                                    0.851948868398994, 0.857082984073764, 0.866093880972339, 0.86955155071249, 
                                    0.874895222129086, 0.925660100586756, 0.937709555741827, 0.957355406538139, 
                                    0.977525146689019, 0.996070829840738, 0.998428331936295, 0.998428331936295, 
                                    0.998480720871752, 0.998795054484493, 0.999161777032691, 0.999528499580889, 
                                    0.999895222129086, 1, 1), value = c(51.78, 46.2, 44.01, 41.1, 
                                                                        39.1, 38.19, 42.87, 42.47, 37.22, 41.6, 44.7, 39.7, 23, 28.7, 
                                                                        23, 30.9, 35.4, 35.8, 32.4, 31, 31)), row.names = c(NA, -21L), class = c("tbl_df", 
                                                                                                                                                 "tbl", "data.frame"))

fit <- stats::loess(value ~ date_int, data = data)

# Make data.frame for loess trend
fit_df <- data.frame(
  date_int = seq(min(data$date_int), max(data$date_int), length.out = 500)
)
fit_df$value <- predict(fit, newdata = fit_df)

# Plot red and blue
ggplot(data,
       aes(x = date_int,
           y = value))  
  geom_point()  
  geom_smooth(colour = "red", size = 1, se = FALSE)  
  geom_line(data = fit_df, colour = "blue", size = 1)  
  labs(title = "RED (geom_smooth) is smoother\nthan BLUE (geom_line)")
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Created on 2022-04-20 by the reprex package (v0.3.0)

As mentioned in the comments, your previous approach only gave fitted values for the datapoints in your dataframe (and not a sequence along the x-axis).

  • Related