Home > Mobile >  Auto VIF (Variable importance factor) for variable analysis
Auto VIF (Variable importance factor) for variable analysis

Time:10-13

I want to calculate an "Auto" VIF for variables, one vs the others. For example in the iris dataset, I want Sepal.Width to act as target value and the others as explanatory variables.

First I remove the Species column, so only the variables stay. Then I want to loop over each variable and test it again the others. Finally I want the VIF resulto to be stored in a list.

This is what I have tried:

library(car)
library(dplyr)
library(tidyr)

iris_clean <- iris %>%
  select(-Species)

col_names <- colnames(iris)
i <- 1
for(col in col_names) {
  regr <- lm(col ~ ., data=iris_clean) 
  list[i] <- vif(regr)  
  i <- i 1
}

For some reason I get an error:

Error in model.frame.default(formula = col ~ ., data = iris_clean, drop.unused.levels = TRUE) : 
  variable lengths differ (found for 'Sepal.Length')

Which I don't understand because the variables have the same length. Please, any help will be greatly appreciated.

CodePudding user response:

You can try this -

library(car)
library(dplyr)
library(tidyr)

iris_clean <- iris %>% select(-Species)
col_names <- colnames(iris_clean)
result <- vector('list', length(col_names))

for(i in seq_along(col_names)) {
  regr <- lm(paste0(col_names[i], '~ .'), data=iris_clean) 
  result[[i]] <- vif(regr)  
}

result

#[[1]]
# Sepal.Width Petal.Length  Petal.Width 
#    1.270815    15.097572    14.234335 

#[[2]]
#Sepal.Length Petal.Length  Petal.Width 
#    4.278282    19.426391    14.089441 

#[[3]]
#Sepal.Length  Sepal.Width  Petal.Width 
#    3.415733     1.305515     3.889961 

#[[4]]
#Sepal.Length  Sepal.Width Petal.Length 
#    6.256954     1.839639     7.557780 
  • Related