I have constructed a linear regression model, reg_model1, and the model has factors within it. However, within the different sets of factors in the model, very few are significant along with other continuous variables. Is there any code that one can supply to the reg_model1 to produce a summary that outputs only predictors that best fits the model?
CodePudding user response:
From a statistical point of view I think you are making confusion between independent variables influencing the dependent variable and goodness of fit of the model, so my advice is to be sure about what you are trying to obtain. That said, if you want a representation of your model that only includes some of the variables, you may transform it into a dataframe with broom::tidy
:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(broom)
### Create factors ###
mtcars <- mutate(mtcars, across(c(vs, am, gear), as.factor))
lm(mpg ~ disp vs am gear, data=mtcars) |>
tidy() |>
filter(p.value <= 0.05)
#> # A tibble: 3 × 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 24.7 3.36 7.34 0.0000000865
#> 2 disp -0.0282 0.00924 -3.05 0.00518
#> 3 am1 4.67 2.09 2.23 0.0345
Created on 2021-11-20 by the reprex package (v2.0.1)
CodePudding user response:
I'd suggest Stepwise Regression / Stepwise Selection. With this you can choose a best fit based on RSME and the goodness of fit. Here's a good source performed on mtcars dataset. There are several other packages which offer pretty much the same thing. I personally prefer to use step function for this purpose.
step.model <- step(lm(mpg ~ ., mtcars), direction="both", trace=FALSE);
summary(step.model)