Home > Mobile >  How to add a step to remove a column with constant value?
How to add a step to remove a column with constant value?

Time:09-16

Background: I'm creating a recipe to clean and transform time-series data that will be used by multiple models. One of the steps in the recipe is to remove correlated predictors using the step_corr() function.

However, due to the nature of the data set, some of the variables can have a constant value for the entire set of training data when doing cross-validation using a rolling window and thus cause the step_corr() function to throw a warning.

Problem Statement: In such cases, is it possible to exclude such variables from the correlation step? Or perhaps remove the variable entirely?

P.S. I know I can easily ignore the warning and proceed. But I'm looking for a cleaner approach / best practice advice.

CodePudding user response:

There are two steps for you to consider:

  • step_zv() will remove variables that all have the same value (zero variance)
  • step_nzv() will remove variables that almost all have the same value (highly sparse and unbalanced)
  • Related