Home > Software design >  Calculate new dataframe variables with distance from column mean
Calculate new dataframe variables with distance from column mean

Time:03-03

I have a dataframe that look something like this.

User V1 V2 V3
Jim .34 .33 .88
David .54 .34 .71
Scott .12 .25 .12
Frank .76 .76 .44
Doug .68 .09 .54
Tom .91 .67 .92

But I would like to calculate a new variables. I want the new variables (V1_DISTfromMEAN, V2_DISTfromMEAN, V3_DISTfromMEAN) to be calculated by subtracting each observation from their corresponding variables (V1, V2, V3) from the column's mean value. For example, the mean for the column V1 is .55. So for Jim, I would want the equation to be .34 - .55 = -0.21. for V1_DISTfromMean. The resulting dataframe would look something like the one below, with all values filled in.

User V1 V2 V3 V1_DISTfromMEAN V2_DISTfromMEAN V1_DISTfromMEAN
Jim .34 .33 .88 - .21 ??? ???
David .54 .34 .71 - .01 ??? ???
Scott .12 .25 .12 ??? ??? ???
Frank .76 .76 .44 ??? ??? ???
Doug .68 .09 .54 ??? ??? ???
Tom .91 .67 .92 ??? ??? ???

Any help would be greatly appreciated.Let me know if I've failed to include all the necessary data.

CodePudding user response:

Use colMeans to get a vector of means, subtract from the input data set taking care that R's table operations are in column major order, bind the original with the result.

df1 <- read.table(text = "
User    V1  V2  V3
Jim     .34     .33     .88
David   .54     .34     .71
Scott   .12     .25     .12
Frank   .76     .76     .44
Doug    .68     .09     .54
Tom     .91     .67     .92
", header = TRUE)

mu <- colMeans(df1[-1])
tmp <- t(t(df1[-1]) - mu)
colnames(tmp) <- paste(colnames(tmp), "DISTfromMEAN", sep = "_")
df2 <- cbind(df1, tmp)
rm(tmp)

df2
#>    User   V1   V2   V3 V1_DISTfromMEAN V2_DISTfromMEAN V3_DISTfromMEAN
#> 1   Jim 0.34 0.33 0.88     -0.21833333     -0.07666667      0.27833333
#> 2 David 0.54 0.34 0.71     -0.01833333     -0.06666667      0.10833333
#> 3 Scott 0.12 0.25 0.12     -0.43833333     -0.15666667     -0.48166667
#> 4 Frank 0.76 0.76 0.44      0.20166667      0.35333333     -0.16166667
#> 5  Doug 0.68 0.09 0.54      0.12166667     -0.31666667     -0.06166667
#> 6   Tom 0.91 0.67 0.92      0.35166667      0.26333333      0.31833333

Created on 2022-03-02 by the reprex package (v2.0.1)

CodePudding user response:

We could use across:

library(dplyr)

df %>% 
  mutate(across(-User, ~. -mean(.), .names = "{.col}_DISTfromMEAN"))
   User   V1   V2   V3 V1_DISTfromMEAN V2_DISTfromMEAN V3_DISTfromMEAN
1   Jim 0.34 0.33 0.88     -0.21833333     -0.07666667      0.27833333
2 David 0.54 0.34 0.71     -0.01833333     -0.06666667      0.10833333
3 Scott 0.12 0.25 0.12     -0.43833333     -0.15666667     -0.48166667
4 Frank 0.76 0.76 0.44      0.20166667      0.35333333     -0.16166667
5  Doug 0.68 0.09 0.54      0.12166667     -0.31666667     -0.06166667
6   Tom 0.91 0.67 0.92      0.35166667      0.26333333      0.31833333

CodePudding user response:

A possible solution, based on dplyr:

library(dplyr)

df <- data.frame(
  stringsAsFactors = FALSE,
  User = c("Jim", "David", "Scott", "Frank", "Doug", "Tom"),
  V1 = c(0.34, 0.54, 0.12, 0.76, 0.68, 0.91),
  V2 = c(0.33, 0.34, 0.25, 0.76, 0.09, 0.67),
  V3 = c(0.88, 0.71, 0.12, 0.44, 0.54, 0.92)
)

df %>% 
  mutate(across(-1, ~ .x - mean(.x), .names = "{.col}_DISTfromMEAN"))

#>    User   V1   V2   V3 V1_DISTfromMEAN V2_DISTfromMEAN V3_DISTfromMEAN
#> 1   Jim 0.34 0.33 0.88     -0.21833333     -0.07666667      0.27833333
#> 2 David 0.54 0.34 0.71     -0.01833333     -0.06666667      0.10833333
#> 3 Scott 0.12 0.25 0.12     -0.43833333     -0.15666667     -0.48166667
#> 4 Frank 0.76 0.76 0.44      0.20166667      0.35333333     -0.16166667
#> 5  Doug 0.68 0.09 0.54      0.12166667     -0.31666667     -0.06166667
#> 6   Tom 0.91 0.67 0.92      0.35166667      0.26333333      0.31833333
  • Related