I am trying to fit several linear models using tidyverse
in R. I am interested in printing out the results of the model fit using summary
as well as a custom function designed to return statistical parameters not returned by summary
like AIC values, and then apply this model to predict values in a set of known data (a test dataset). Here is an example of what I am doing using the mtcars dataset.
library(tidyverse);library(magrittr)
mtcars%>%
filter(gear=="4")%$%
lm(hp~mpg)%>%
summary()
mtcars%>%
filter(gear=="4")%$%
lm(hp~mpg)%>%
AIC()
mtcars%>%
filter(gear=="4")%$%
lm(hp~mpg)%>%
predict(newdata=data.frame(mpg=19))
I am often doing a lot of filtering of my data before calling lm
(due to missing data that are not missing for all models, using mutate
calls, using summarise
, or filtering based on a categorical variable of interest), and fitting many different model permutations. However, I end up having to call the same code multiple times in order to obtain the summary statistics.
Normally I would just save the lm
models as an object but in this case I am interested in just running a preliminary test to see what the results look like to see if that version is worth saving, and I don't want large numbers of lm
objects cluttering up my global environment. However it seems once a pipe is called after lm
it is not possible to call the temporary lm
object again.
Is there any tidy way to retain a fitted lm
object and fork it in the same string of code such that I can print the results of a summary
, predict
, and AIC
function in a single call?
CodePudding user response:
A magritter pipeline allows for a code block where .
is the value coming from the chain. So
mtcars%>%
filter(gear=="4")%$%
lm(hp~mpg)%>% {list(
summary(.),
AIC(.),
predict(., newdata=data.frame(mpg=19))
)}
Will work
You could also kind of use the %T>%
(tee) pipe. But you'll need to explicitly print the values or something in the chain if you want to see them
mtcars%>%
filter(gear=="4")%$%
lm(hp~mpg) %T>%
{print(summary(.))} %T>%
{print(AIC(.))} %>%
predict(newdata=data.frame(mpg=19))
CodePudding user response:
One option is to make a custom function that produces the desired outputs together. Then you can feed whatever data you like in as a single line.
library(tidyverse)
## function to produce all desired outputs in one object
f <- function(train_data = mtcars,
x = "mpg",
y = "hp",
test_data = data.frame(mpg = 19)) {
formula <- as.formula(paste0(y, "~", x))
mod <- lm(formula, data = train_data)
list(
summary = summary(mod),
AIC = AIC(mod),
prediction = predict(mod, test_data)
)
}
f()
#> $summary
#>
#> Call:
#> lm(formula = formula, data = train_data)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -59.26 -28.93 -13.45 25.65 143.36
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 324.08 27.43 11.813 8.25e-13 ***
#> mpg -8.83 1.31 -6.742 1.79e-07 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 43.95 on 30 degrees of freedom
#> Multiple R-squared: 0.6024, Adjusted R-squared: 0.5892
#> F-statistic: 45.46 on 1 and 30 DF, p-value: 1.788e-07
#>
#>
#> $AIC
#> [1] 336.8553
#>
#> $prediction
#> 1
#> 156.3174
Created on 2022-07-21 by the reprex package (v2.0.1)