In the tidymodels ecosystem, is there an equivalent to collect_metrics()
that will evaluate model performance on a training dataset without using resampling?
Why?
The collect_metrics()
function is a lovely way to extract model performance metrics with resampling. I am teaching and I would love to apply collect_metrics()
to simple fit()
models to make the point about how overly optimistic results are when you evaluate on your training data.
Showing the process of fitting the model to the training data and calling model evaluation functions (e.g., accuracy()
and roc_auc()
etc. for a logistic model) is a useful but very distracting tangent that I am trying to avoid. I am thinking I could build a function that would call the "default" collect_metrics()
metrics on a "model_fit" object but I am hoping somebody beat me too it.
CodePudding user response:
You can do it in two lines via augment()
and a metric set:
library(tidymodels)
tidymodels_prefer()
theme_set(theme_bw())
options(pillar.advice = FALSE, pillar.min_title_chars = Inf)
data("two_class_dat")
mod_fit <-
logistic_reg() %>%
fit(Class ~ ., data = two_class_dat)
# Make your own metric set
some_metrics <- metric_set(accuracy, roc_auc)
# Get predictions on the training set
augment(mod_fit, new_data = two_class_dat) %>%
# Evaluate the metric set
some_metrics(Class, .pred_Class1, estimate = .pred_class)
#> # A tibble: 2 × 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 accuracy binary 0.819
#> 2 roc_auc binary 0.888
Created on 2022-11-29 by the reprex package (v2.0.1)