Home > Software design >  ggeffects giving different prediction results from lm extracted as parsnip model, despite same coeff
ggeffects giving different prediction results from lm extracted as parsnip model, despite same coeff

Time:12-13

I have a question about predictions using ggeffects, which is giving me completely different results if I use a traditional lm fit or an extracted parsnip model fit (despite having the same coefficients). Here is an example...

library(tidyverse)
library(tidymodels)
library(ggeffects)

test_df <- structure(list(weight = c(-1.7, 0, 0.6, 0.6, -0.7, -0.3, -0.6, 
-1, -1, 2, 0.1, -0.6, -1.5, 2, -0.7, -0.2, -0.9, -0.6, 1.1, -2, 
1.4, -1, -1.1, 0.5, 1.3, 0, -0.5, -3, 1.1, -0.6), steps = c(19217, 
15758, 14124, 14407, 5565, 20860, 17536, 17156, 17219, 652, 1361, 
8524, 1169, 3117, 3135, 1917, 4267, 7067, 8927, 2436, 3014, 5281, 
8104, 6836, 8939, 4923, 6885, 10581, 10370, 11024), calories = c(1943, 
1581, 1963, 1551, 1699, 1789, 1550, 2036, 1707, 1522, 1672, 1994, 
1588, 1506, 1678, 1673, 1662, 1906, 1814, 1609, 1799, 1825, 1654, 
2291, 1788, 2019, 1911, 1589, 2177, 2137)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -30L)) %>% 
  as_tibble(.)

#lm fit
lmmod_simp <- lm(weight ~  steps * calories, data = test_df)

#tidymodels
linear_reg_lm_spec <-
  linear_reg() %>%
  set_engine('lm')

basic_rec <-  recipe(weight ~ steps    calories, test_df) %>% 
  step_interact(terms = ~ steps:calories) 

lm_wflw <- workflow() %>% 
  add_recipe(basic_rec) %>% 
  add_model(linear_reg_lm_spec) 

lm_fit <- fit(lm_wflw, data = test_df)

lm_fit_extracted <- lm_fit %>% extract_fit_parsnip() 

When I look at the output, both have the same coefficients

lmmod_simp

lm_fit_extracted

But when I go to predict, the predictions are completely different

ggemmeans(lmmod_simp, terms = c("steps", "calories[1500,2000,2500]")) %>%
  as.data.frame() %>%
  ggplot(aes(x,predicted, color=group, linetype = group)) 
  geom_line()

modlm 1

ggemmeans(lm_fit_extracted, terms = c("steps", "calories[1500,2000,2500]")) %>%
  as.data.frame() %>%
  ggplot(aes(x,predicted, color=group, linetype = group)) 
  geom_line()

mod lm2

Perhaps I can't/shouldn't use the parsnip fit object in this way, but it seems odd since they are showing the same coefficients.

I appreciate any help!

CodePudding user response:

You are getting different results because lmmod_simp and lm_fit_extracted are different models. While lm_fit has an interaction effect on steps, lm_fit_extracted has no idea about this interaction as it gets the data after the interaction calculation has been performed.

It is generally not recommended to pull out models from a workflow object if you plan on using it for other things than diagnostics.

  • Related