Creating Several New Scaled Variables at Once (additive or mean)-CodePudding

Say I want to create a two new variables - "mean1" and "mean2" - whereby: "mean1" is the average of "var1" and "var2" and "mean2" is the average of "var3", "var4", and "var5. Here is an example data frame:

set.seed(23424)
df <- data.frame(var1 = runif(5, 0, 5),
                 var2 = runif(5, 0, 5),
                 var3 = runif(5, 0, 5),
                 var4 = runif(5, 0, 5),
                 var5 = runif(5, 0, 5))

I COULD brute force it with something like:

df$mean1 <- rowMeans(df[,1:2])
df$mean2 <- rowMeans(df[,3:5])

But if I had to do stuff like this a lot, it would get tedious and clunky. It would be nice if there was a way to do this more efficiently. When I try to use loops or apply statements for this, it never goes correctly.

Thanks in advance!

CodePudding user response：

You could do something like this - where you have the variables you want to aggregate in a list. I assume there's probably a better fully tidy way of doing this, but this works:

library(dplyr)
set.seed(23424)
df <- data.frame(var1 = runif(5, 0, 5),
                   var2 = runif(5, 0, 5),
                   var3 = runif(5, 0, 5),
                   var4 = runif(5, 0, 5),
                   var5 = runif(5, 0, 5))
l <- list(mean1 = c("var1", "var2"), 
          mean2 = c("var3", "var4", "var5"))
  

for(i in 1:length(l)){
  df <- df %>% 
    mutate(!!sym(names(l)[i]) := rowMeans(cur_data()[,l[[i]]]))
}
df
#>       var1      var2      var3      var4      var5    mean1     mean2
#> 1 1.892178 0.2488837 4.3203682 2.6517051 1.2454473 1.070531 2.7391735
#> 2 1.390501 2.9131956 0.8851525 3.9931125 1.8389664 2.151848 2.2390771
#> 3 3.131567 4.8579541 0.1950122 3.9789130 4.6969826 3.994761 2.9569693
#> 4 4.425019 2.5628706 0.6257656 0.1144681 1.8303231 3.493945 0.8568523
#> 5 2.621068 4.7636304 1.2762756 1.1706242 0.1881539 3.692349 0.8783512

^{Created on 2022-05-11 by the reprex package (v2.0.1)}

CodePudding user response：

@DaveArmstrong has a great answer that might be a bit more efficient than what I came with. However, I came up with the below code that might be a bit more intuitive. Either one works and will probably help if you're doing stuff like this a lot. Thanks for all the help and, of course, I welcome even more solutions!

set.seed(23424)
df <- data.frame(var1 = runif(5, 0, 5),
                 var2 = runif(5, 0, 5),
                 var3 = runif(5, 0, 5),
                 var4 = runif(5, 0, 5),
                 var5 = runif(5, 0, 5))

l <- as.list(df[,1:2], 
             df[,3:5])
    
vars <- c("mean1", "mean2")
means <- lapply(l, mean)
    
for(i in 1:length(means)){
  df[,6:7] <- l[i]
  names(df)[c(6:7)] <- vars
  }