Home > database >  Centering only the columns selected, leaving other columns intact
Centering only the columns selected, leaving other columns intact

Time:03-02

Example data:

structure(list(ID = c(1, 2, 3, 4, 5), x1 = c(2.6, 3.8, 2.6, 4.3, 
2.6), x2 = c(3.2, 3.2, 3.2, 4, 3.2), x3 = c(4.5, 3.5, 4.5, 3, 
4.5), x4 = c(2.4, 3, 2.4, 3.1, 2.4), x5 = c(3.8, 2.6, 4.3, 2.6, 
4.4), x6 = c(3.2, 3.2, 4, 3.2, 2.2), x7 = c(3.5, 4.5, 3, 4.5, 
4), x8 = c(3, 2.4, 3.1, 2.4, 4.3), x9 = c(3.9, 4, 4, 4, 3.9), 
    x10 = c(4, 3.9, 4, 4, 4)), class = "data.frame", row.names = c(NA, 
-5L), variable.labels = structure(character(0), .Names = character(0)), codepage = 65001L)

I'm trying to mean-center only columns 6:10. I would like to ask for your advice on changing the code below that would keep all the 11 columns of the original data, and append the newly centered columns (so in total 16 columns). Currently, the code below only allows for centering the selected columns:

center.scale <- function(c) {
  x <- scale(c, center = T, scale = F)
  colnames(x) <- paste0(colnames(x), "_c")
  cbind(c, x)
}

centered_data <- center.scale(original_data[,c(6:10)])

p.s.

I have tried the code below, but strangely, it keeps on producing centering results that are not accurate.

data2 <- original_data %>%
  mutate_at(.vars = colnames(original_data[6:10]),
            .funs = list("c" = center_scale))

CodePudding user response:

We may pass an index into the function and do the subset inside

center.scale <- function(dat, ind) {

  x <- scale(dat[, ind], center = TRUE, scale = FALSE)
  colnames(x) <- paste0(colnames(x), "_c")
  cbind(dat, x)
}

-testing

> center.scale(original_data, 5:10)
  ID  x1  x2  x3  x4  x5  x6  x7  x8  x9 x10  x4_c  x5_c  x6_c x7_c  x8_c  x9_c
1  1 2.6 3.2 4.5 2.4 3.8 3.2 3.5 3.0 3.9 4.0 -0.26  0.26  0.04 -0.4 -0.04 -0.06
2  2 3.8 3.2 3.5 3.0 2.6 3.2 4.5 2.4 4.0 3.9  0.34 -0.94  0.04  0.6 -0.64  0.04
3  3 2.6 3.2 4.5 2.4 4.3 4.0 3.0 3.1 4.0 4.0 -0.26  0.76  0.84 -0.9  0.06  0.04
4  4 4.3 4.0 3.0 3.1 2.6 3.2 4.5 2.4 4.0 4.0  0.44 -0.94  0.04  0.6 -0.64  0.04
5  5 2.6 3.2 4.5 2.4 4.4 2.2 4.0 4.3 3.9 4.0 -0.26  0.86 -0.96  0.1  1.26 -0.06

CodePudding user response:

You can do:

library(tidyverse)
df %>%
  mutate(across(x6:x10, scale, scale = FALSE, .names = '{.col}_c'))

  ID  x1  x2  x3  x4  x5  x6  x7  x8  x9 x10  x6_c x7_c  x8_c  x9_c x10_c
1  1 2.6 3.2 4.5 2.4 3.8 3.2 3.5 3.0 3.9 4.0  0.04 -0.4 -0.04 -0.06  0.02
2  2 3.8 3.2 3.5 3.0 2.6 3.2 4.5 2.4 4.0 3.9  0.04  0.6 -0.64  0.04 -0.08
3  3 2.6 3.2 4.5 2.4 4.3 4.0 3.0 3.1 4.0 4.0  0.84 -0.9  0.06  0.04  0.02
4  4 4.3 4.0 3.0 3.1 2.6 3.2 4.5 2.4 4.0 4.0  0.04  0.6 -0.64  0.04  0.02
5  5 2.6 3.2 4.5 2.4 4.4 2.2 4.0 4.3 3.9 4.0 -0.96  0.1  1.26 -0.06  0.02
  • Related