Home > Software design >  R function that selects for numeric vectors and normalizes x to mean(x) = 0 and sd(x) = 1
R function that selects for numeric vectors and normalizes x to mean(x) = 0 and sd(x) = 1

Time:10-02

In R I want to program a function normalize() that normalizes a numeric vector x to mean(x) = 0 and sd(x) = 1, and that provides flexibility in handling NAs using tidyverse functionality.

Using the starwars dataset as an example, I tried to write a function that drops all columns not consisting of numeric values:

normalize <- function(x){
  x_numeric <-select_if(x, is.numeric(unlist(x)))
   (x_numeric - mean(x_numeric, na.rm = TRUE) / sd(x_numeric, na.rm = TRUE))
}

print(normalize(starwars))

I am quite new to R and therefore get several error messages for example:

Error in select_if(x, is.numeric(unlist(x))) : ✖ .p should have the same size as the number of variables in the tibble.

CodePudding user response:

We may use transmute with across

library(dplyr)
starwars %>% 
   transmute(across(where(is.numeric),
      ~ (.x - mean(.x, na.rm = TRUE))/sd(.x, na.rm = TRUE)))

Or as a function

normalize_dat <- function(data) {
      data %>%
        transmute(across(where(is.numeric),
      ~ (.x - mean(.x, na.rm = TRUE))/sd(.x, na.rm = TRUE)))
   }

-testing

> normalize_dat(starwars)
# A tibble: 87 × 3
    height    mass birth_year
     <dbl>   <dbl>      <dbl>
 1 -0.0678 -0.120      -0.443
 2 -0.212  -0.132       0.158
 3 -2.25   -0.385      -0.353
 4  0.795   0.228      -0.295
 5 -0.701  -0.285      -0.443
 6  0.105   0.134      -0.230
 7 -0.269  -0.132      -0.262
 8 -2.22   -0.385      NA    
 9  0.249  -0.0786     -0.411
10  0.220  -0.120      -0.198
# … with 77 more rows

Or use select and then scale

starwars %>% 
    select(where(is.numeric)) %>% 
    scale 
  • Related