I want to standardise all my variables before applying machine learning methods. However, to my understanding, dummy variables should never be standardised. After entering the following code, r standardized all my variables, even the ones which are binary. How can I avoid this happening?
#standardize all non-categorical variables to have mean zero and a standard deviation of one
df_standardized <- df %>% mutate(across(where(is.numeric), scale))
I checked my data types are they are "int", not numeric. Thank you in advance for your help.
CodePudding user response:
scale
returns a matrix, we can convert the matrix to vector by either as.numeric
or as.vector
. In addition, use inherits
for only modifying the numeric
columns
library(dplyr)
out <- df %>%
mutate(across(where(~ inherits(.x, "numeric")),
~ as.numeric(scale(.x))))
data
data(iris)
df <- iris
df$intCol <- 1L