Home > Net >  Workflow Tidymodels Formula Object
Workflow Tidymodels Formula Object

Time:09-29

I am fairly new using R and I am following a guide and learning to build an Expected Goals Model for my hockey league. When I run the code below, I get the error at the bottom. Is there something simple that I am missing?

Seems like its trying to use a formula in the model portion of the workflow but I already have a recipe in there. Thanks in advance for any help anyone can offer me! The guide is here https://www.thesignificantgame.com/portfolio/expected-goals-model-with-tidymodels/

library(tidymodels)
library(tidyverse)
library(dplyr)

set.seed(1972)
train_test_split <- initial_split(data = EXPECTED_GOALS_MODEL, prop = 0.80)
train_data <- train_test_split %>% training() 
test_data  <- train_test_split %>% testing()
    
xg_recipe <- recipe(Goal ~ DistanceC   Angle   Home   Hand   AgeDec31   GoalieAgeDec31   NewX   NewY, data = train_data) %>% update_role(NewX, NewY, new_role = "ID")
    
model <- logistic_reg() %>% set_engine("glm")
    
xg_wflow <- workflow() %>% add_model(model) %>% add_recipe(xg_recipe)

xg_wflow
    
xg_fit <- xg_wflow %>% fit(data = train_data)

Error in validObject(.Object) : 
  invalid class “model” object: invalid object for slot "formula" in class "model": got class "workflow", should be or extend class "formula"
In addition: Warning message:
In fit(., data = train_data) :
  fit failed: Error in as.matrix(y) : argument "y" is missing, with no default
 fit(x = ., data = train_data) 

CodePudding user response:

It's difficult to tell exactly what the issue is without a reproducible example, though this error brings up a few questions up for me:

  • Does the EXPECTED_GOALS_MODEL data indeed have a column called Goal in it, with two unique levels? Have you also spelled the remainder of the column names correctly?
  • Are your tidymodels package installs up to date?
  • Does this error persist if you run specifically generics::fit(data = train_data) instead of fit(data = train_data)? This almost looks like a different fit() is being dispatched to.

Here's a place to start with a reprex:

library(tidymodels)
data(ames)

set.seed(1972)
ames <- ames %>% rowid_to_column()
train_test_split <- initial_split(data = ames, prop = 0.80)
train_data <- train_test_split %>% training() 
test_data  <- train_test_split %>% testing()

xg_recipe <- recipe(Sale_Price ~ ., data = train_data) %>% update_role(rowid, new_role = "ID")

model <- linear_reg() %>% set_engine("glm")

xg_wflow <- workflow() %>% add_model(model) %>% add_recipe(xg_recipe)

xg_fit <- xg_wflow %>% fit(data = train_data)

xg_fit
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: linear_reg()
#> 
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> 0 Recipe Steps
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────
#> 
#> Call:  stats::glm(formula = ..y ~ ., family = stats::gaussian, data = data)
#> 
#> Coefficients:
#>                                          (Intercept)  
#>                                           -2.583e 07  
#>                  MS_SubClassOne_Story_1945_and_Older  
#>                                            7.419e 03  
#>    MS_SubClassOne_Story_with_Finished_Attic_All_Ages  
#>                                            1.562e 04  
#>    MS_SubClassOne_and_Half_Story_Unfinished_All_Ages  
#>                                            1.060e 04  
#>      MS_SubClassOne_and_Half_Story_Finished_All_Ages  
#>                                            8.413e 03  
#>                  MS_SubClassTwo_Story_1946_and_Newer  
#>                                            3.007e 03  
#>                  MS_SubClassTwo_Story_1945_and_Older  
#>                                            1.793e 04  
#>               MS_SubClassTwo_and_Half_Story_All_Ages  
#>                                           -3.909e 03  
#>                       MS_SubClassSplit_or_Multilevel  
#>                                           -1.098e 04  
#>                               MS_SubClassSplit_Foyer  
#>                                           -4.038e 03  
#>                MS_SubClassDuplex_All_Styles_and_Ages  
#>                                           -2.004e 04  
#>              MS_SubClassOne_Story_PUD_1946_and_Newer  
#>                                           -2.335e 04  
#>           MS_SubClassOne_and_Half_Story_PUD_All_Ages  
#>                                           -2.482e 04  
#>              MS_SubClassTwo_Story_PUD_1946_and_Newer  
#>                                           -1.794e 04  
#>          MS_SubClassPUD_Multilevel_Split_Level_Foyer  
#>                                           -2.098e 04  
#> MS_SubClassTwo_Family_conversion_All_Styles_and_Ages  
#>                                            6.903e 03  
#>                    MS_ZoningResidential_High_Density  
#>                                           -3.853e 03  
#>                     MS_ZoningResidential_Low_Density  
#>                                           -3.661e 03  
#>                  MS_ZoningResidential_Medium_Density  
#>                                           -8.240e 03  
#>                                       MS_ZoningA_agr  
#>                                           -3.824e 03  
#>                                       MS_ZoningC_all  
#>                                           -1.800e 04  
#>                                       MS_ZoningI_all  
#>                                           -3.299e 04  
#>                                         Lot_Frontage  
#>                                            1.336e 01  
#> 
#> ...
#> and 506 more lines.

Created on 2022-09-28 by the reprex package (v2.0.1)

Hope this helps!

Simon, tidymodels team

  • Related