Home > Blockchain >  Scale and center multiple columns in R adding new names to the new columns
Scale and center multiple columns in R adding new names to the new columns

Time:05-20

I have a large data frame with several columns of varioius classes (characters, factors and numeric). A subset of the data frame can be reproduced with the code below:

df <- structure(
  list(
                  id = c("1", "2", "3", "4", "5"), 
                  gender = structure(c(1L, 2L, 1L, 2L, 1L), levels = c("Female", "Male"), class = "factor"), 
                  age = c(78, 64, 79, 98, 82),
                  score1 = c(-0.019375, -0.025835, -0.029842, -0.029842, -0.027398),
                  score2 = c(0.0004892, -0.001254932, -0.00135780, -0.00312374, -0.00685426), 
                  score3 = c(-0.05938750, -0.1237563, -0.08442363, -0.09326243, -0.091492836)),
  row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))

I would like to standardize score1, score2 and score3 using the scale function (base), add them to the dataframe with a new name (adding a z in front of score) and keep the original scores in the data frame.

I have so far created the normalized scores with the code below, but would like to use a function or a loop to make the code more efficient, as my data frame has several more scores to standardize.

df$zscore1 <- scale(df$score1, center = TRUE, scale = TRUE)
df$zscore2 <- scale(df$score2, center = TRUE, scale = TRUE)
df$zscore3 <- scale(df$score3, center = TRUE, scale = TRUE)

Any suggestions how to solve this?

Edit:

Sotos´ solution works perfect for the example I provided. However the names of the score columns are not as organized as in the provided sample. I appologize for that. They are more like this:

df <- structure(
  list(
                  id = c("1", "2", "3", "4", "5"), 
                  gender = structure(c(1L, 2L, 1L, 2L, 1L), levels = c("Female", "Male"), class = "factor"), 
                  age = c(78, 64, 79, 98, 82),
                  AD = c(-0.019375, -0.025835, -0.029842, -0.029842, -0.027398),
                  PD1 = c(0.0004892, -0.001254932, -0.00135780, -0.00312374, -0.00685426), 
                  DEM = c(-0.05938750, -0.1237563, -0.08442363, -0.09326243, -0.091492836)),
  row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))

And the output I seek is as bellow:

df$zAD_2 <- scale(df$AD, center = TRUE, scale = TRUE)
df$zPD1_2 <- scale(df$PD1, center = TRUE, scale = TRUE)
df$zDEM_2 <- scale(df$DEM, center = TRUE, scale = TRUE)

CodePudding user response:

You can try,

i1 <- -seq(3)
df[paste0('z', names(df)[i1], '_2')] <- scale(df[i1], center = TRUE, scale = TRUE)

df
# A tibble: 5 x 9
  id    gender   age      AD       PD1     DEM  zAD_2  zPD1_2   zDEM_2
  <chr> <fct>  <dbl>   <dbl>     <dbl>   <dbl>  <dbl>  <dbl>   <dbl>
1 1     Female    78 -0.0194  0.000489 -0.0594  1.64   1.04   1.35  
2 2     Male      64 -0.0258 -0.00125  -0.124   0.145  0.418 -1.45  
3 3     Female    79 -0.0298 -0.00136  -0.0844 -0.785  0.381  0.262 
4 4     Male      98 -0.0298 -0.00312  -0.0933 -0.785 -0.252 -0.122 
5 5     Female    82 -0.0274 -0.00685  -0.0915 -0.218 -1.59  -0.0447
  •  Tags:  
  • r
  • Related