I have a dataframe that look something like this.
User | V1 | V2 | V3 |
---|---|---|---|
Jim | .34 | .33 | .88 |
David | .54 | .34 | .71 |
Scott | .12 | .25 | .12 |
Frank | .76 | .76 | .44 |
Doug | .68 | .09 | .54 |
Tom | .91 | .67 | .92 |
But I would like to calculate a new variables. I want the new variables (V1_DISTfromMEAN
, V2_DISTfromMEAN
, V3_DISTfromMEAN
) to be calculated by subtracting each observation from their corresponding variables (V1
, V2
, V3
) from the column's mean value. For example, the mean for the column V1
is .55. So for Jim, I would want the equation to be .34 - .55 = -0.21. for V1_DISTfromMean
. The resulting dataframe would look something like the one below, with all values filled in.
User | V1 | V2 | V3 | V1_DISTfromMEAN | V2_DISTfromMEAN | V1_DISTfromMEAN |
---|---|---|---|---|---|---|
Jim | .34 | .33 | .88 | - .21 | ??? | ??? |
David | .54 | .34 | .71 | - .01 | ??? | ??? |
Scott | .12 | .25 | .12 | ??? | ??? | ??? |
Frank | .76 | .76 | .44 | ??? | ??? | ??? |
Doug | .68 | .09 | .54 | ??? | ??? | ??? |
Tom | .91 | .67 | .92 | ??? | ??? | ??? |
Any help would be greatly appreciated.Let me know if I've failed to include all the necessary data.
CodePudding user response:
Use colMeans
to get a vector of means, subtract from the input data set taking care that R's table operations are in column major order, bind the original with the result.
df1 <- read.table(text = "
User V1 V2 V3
Jim .34 .33 .88
David .54 .34 .71
Scott .12 .25 .12
Frank .76 .76 .44
Doug .68 .09 .54
Tom .91 .67 .92
", header = TRUE)
mu <- colMeans(df1[-1])
tmp <- t(t(df1[-1]) - mu)
colnames(tmp) <- paste(colnames(tmp), "DISTfromMEAN", sep = "_")
df2 <- cbind(df1, tmp)
rm(tmp)
df2
#> User V1 V2 V3 V1_DISTfromMEAN V2_DISTfromMEAN V3_DISTfromMEAN
#> 1 Jim 0.34 0.33 0.88 -0.21833333 -0.07666667 0.27833333
#> 2 David 0.54 0.34 0.71 -0.01833333 -0.06666667 0.10833333
#> 3 Scott 0.12 0.25 0.12 -0.43833333 -0.15666667 -0.48166667
#> 4 Frank 0.76 0.76 0.44 0.20166667 0.35333333 -0.16166667
#> 5 Doug 0.68 0.09 0.54 0.12166667 -0.31666667 -0.06166667
#> 6 Tom 0.91 0.67 0.92 0.35166667 0.26333333 0.31833333
Created on 2022-03-02 by the reprex package (v2.0.1)
CodePudding user response:
We could use across
:
library(dplyr)
df %>%
mutate(across(-User, ~. -mean(.), .names = "{.col}_DISTfromMEAN"))
User V1 V2 V3 V1_DISTfromMEAN V2_DISTfromMEAN V3_DISTfromMEAN
1 Jim 0.34 0.33 0.88 -0.21833333 -0.07666667 0.27833333
2 David 0.54 0.34 0.71 -0.01833333 -0.06666667 0.10833333
3 Scott 0.12 0.25 0.12 -0.43833333 -0.15666667 -0.48166667
4 Frank 0.76 0.76 0.44 0.20166667 0.35333333 -0.16166667
5 Doug 0.68 0.09 0.54 0.12166667 -0.31666667 -0.06166667
6 Tom 0.91 0.67 0.92 0.35166667 0.26333333 0.31833333
CodePudding user response:
A possible solution, based on dplyr
:
library(dplyr)
df <- data.frame(
stringsAsFactors = FALSE,
User = c("Jim", "David", "Scott", "Frank", "Doug", "Tom"),
V1 = c(0.34, 0.54, 0.12, 0.76, 0.68, 0.91),
V2 = c(0.33, 0.34, 0.25, 0.76, 0.09, 0.67),
V3 = c(0.88, 0.71, 0.12, 0.44, 0.54, 0.92)
)
df %>%
mutate(across(-1, ~ .x - mean(.x), .names = "{.col}_DISTfromMEAN"))
#> User V1 V2 V3 V1_DISTfromMEAN V2_DISTfromMEAN V3_DISTfromMEAN
#> 1 Jim 0.34 0.33 0.88 -0.21833333 -0.07666667 0.27833333
#> 2 David 0.54 0.34 0.71 -0.01833333 -0.06666667 0.10833333
#> 3 Scott 0.12 0.25 0.12 -0.43833333 -0.15666667 -0.48166667
#> 4 Frank 0.76 0.76 0.44 0.20166667 0.35333333 -0.16166667
#> 5 Doug 0.68 0.09 0.54 0.12166667 -0.31666667 -0.06166667
#> 6 Tom 0.91 0.67 0.92 0.35166667 0.26333333 0.31833333