I have a large data frame with several columns of varioius classes (characters, factors and numeric). A subset of the data frame can be reproduced with the code below:
df <- structure(
list(
id = c("1", "2", "3", "4", "5"),
gender = structure(c(1L, 2L, 1L, 2L, 1L), levels = c("Female", "Male"), class = "factor"),
age = c(78, 64, 79, 98, 82),
score1 = c(-0.019375, -0.025835, -0.029842, -0.029842, -0.027398),
score2 = c(0.0004892, -0.001254932, -0.00135780, -0.00312374, -0.00685426),
score3 = c(-0.05938750, -0.1237563, -0.08442363, -0.09326243, -0.091492836)),
row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))
I would like to standardize score1, score2 and score3 using the scale
function (base), add them to the dataframe with a new name (adding a z in front of score) and keep the original scores in the data frame.
I have so far created the normalized scores with the code below, but would like to use a function or a loop to make the code more efficient, as my data frame has several more scores to standardize.
df$zscore1 <- scale(df$score1, center = TRUE, scale = TRUE)
df$zscore2 <- scale(df$score2, center = TRUE, scale = TRUE)
df$zscore3 <- scale(df$score3, center = TRUE, scale = TRUE)
Any suggestions how to solve this?
Edit:
Sotos´ solution works perfect for the example I provided. However the names of the score columns are not as organized as in the provided sample. I appologize for that. They are more like this:
df <- structure(
list(
id = c("1", "2", "3", "4", "5"),
gender = structure(c(1L, 2L, 1L, 2L, 1L), levels = c("Female", "Male"), class = "factor"),
age = c(78, 64, 79, 98, 82),
AD = c(-0.019375, -0.025835, -0.029842, -0.029842, -0.027398),
PD1 = c(0.0004892, -0.001254932, -0.00135780, -0.00312374, -0.00685426),
DEM = c(-0.05938750, -0.1237563, -0.08442363, -0.09326243, -0.091492836)),
row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))
And the output I seek is as bellow:
df$zAD_2 <- scale(df$AD, center = TRUE, scale = TRUE)
df$zPD1_2 <- scale(df$PD1, center = TRUE, scale = TRUE)
df$zDEM_2 <- scale(df$DEM, center = TRUE, scale = TRUE)
CodePudding user response:
You can try,
i1 <- -seq(3)
df[paste0('z', names(df)[i1], '_2')] <- scale(df[i1], center = TRUE, scale = TRUE)
df
# A tibble: 5 x 9
id gender age AD PD1 DEM zAD_2 zPD1_2 zDEM_2
<chr> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 Female 78 -0.0194 0.000489 -0.0594 1.64 1.04 1.35
2 2 Male 64 -0.0258 -0.00125 -0.124 0.145 0.418 -1.45
3 3 Female 79 -0.0298 -0.00136 -0.0844 -0.785 0.381 0.262
4 4 Male 98 -0.0298 -0.00312 -0.0933 -0.785 -0.252 -0.122
5 5 Female 82 -0.0274 -0.00685 -0.0915 -0.218 -1.59 -0.0447