Home > Mobile >  Writing an R loop to create new standardized columns
Writing an R loop to create new standardized columns

Time:12-08

I'm using the Ionosphere dataset in R and am trying to write a loop that will create new columns that are standardized iterations of existing columns and name them accordingly.

I've got the "cname" as the new column name and c as the original. The code is:

install.packages("mlbench") 
library(mlbench) 
data('Ionosphere')
library(robustHD)
col <- colnames(Ionosphere)
for (c in col[1:length(col)-1]){
  cname <- paste(c,"Std")
  Ionosphere$cname <- standardize(Ionosphere$c)

  }

But get the following error:

"Error in `$<-.data.frame`(`*tmp*`, "cname", value = numeric(0)) : 
  replacement has 0 rows, data has 351
In addition: Warning message:
In mean.default(x) : argument is not numeric or logical: returning NA"

I feel like there's something super-simple I'm missing but I just can't see it.

Any help gratefully received.

CodePudding user response:

We can use lapply, a custom-made standardization function, setNames, and cbind. I do not have access to your dataset, so I am using the iris dataset as an example:

df<-iris
cbind(df, set_names(lapply(df[1:4],
                           \(x) (x - mean(x))/sd(x)),
                     paste0(names(df[1:4]), '_Std')))

    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species Sepal.Length_Std Sepal.Width_Std Petal.Length_Std Petal.Width_Std
1            5.1         3.5          1.4         0.2     setosa      -0.89767388      1.01560199      -1.33575163   -1.3110521482
2            4.9         3.0          1.4         0.2     setosa      -1.13920048     -0.13153881      -1.33575163   -1.3110521482
3            4.7         3.2          1.3         0.2     setosa      -1.38072709      0.32731751      -1.39239929   -1.3110521482
4            4.6         3.1          1.5         0.2     setosa      -1.50149039      0.09788935      -1.27910398   -1.3110521482
5            5.0         3.6          1.4         0.2     setosa      -1.01843718      1.24503015      -1.33575163   -1.3110521482
...

I feel these transformations get easier with dplyr:

library(dplyr)

iris %>% mutate(across(where(is.numeric),
                       ~ (.x - mean(.x))/sd(.x),
                       .names = "{col}_Std"))
  • Related