Say I want to create a two new variables - "mean1" and "mean2" - whereby: "mean1" is the average of "var1" and "var2" and "mean2" is the average of "var3", "var4", and "var5. Here is an example data frame:
set.seed(23424)
df <- data.frame(var1 = runif(5, 0, 5),
var2 = runif(5, 0, 5),
var3 = runif(5, 0, 5),
var4 = runif(5, 0, 5),
var5 = runif(5, 0, 5))
I COULD brute force it with something like:
df$mean1 <- rowMeans(df[,1:2])
df$mean2 <- rowMeans(df[,3:5])
But if I had to do stuff like this a lot, it would get tedious and clunky. It would be nice if there was a way to do this more efficiently. When I try to use loops or apply statements for this, it never goes correctly.
Thanks in advance!
CodePudding user response:
You could do something like this - where you have the variables you want to aggregate in a list. I assume there's probably a better fully tidy way of doing this, but this works:
library(dplyr)
set.seed(23424)
df <- data.frame(var1 = runif(5, 0, 5),
var2 = runif(5, 0, 5),
var3 = runif(5, 0, 5),
var4 = runif(5, 0, 5),
var5 = runif(5, 0, 5))
l <- list(mean1 = c("var1", "var2"),
mean2 = c("var3", "var4", "var5"))
for(i in 1:length(l)){
df <- df %>%
mutate(!!sym(names(l)[i]) := rowMeans(cur_data()[,l[[i]]]))
}
df
#> var1 var2 var3 var4 var5 mean1 mean2
#> 1 1.892178 0.2488837 4.3203682 2.6517051 1.2454473 1.070531 2.7391735
#> 2 1.390501 2.9131956 0.8851525 3.9931125 1.8389664 2.151848 2.2390771
#> 3 3.131567 4.8579541 0.1950122 3.9789130 4.6969826 3.994761 2.9569693
#> 4 4.425019 2.5628706 0.6257656 0.1144681 1.8303231 3.493945 0.8568523
#> 5 2.621068 4.7636304 1.2762756 1.1706242 0.1881539 3.692349 0.8783512
Created on 2022-05-11 by the reprex package (v2.0.1)
CodePudding user response:
@DaveArmstrong has a great answer that might be a bit more efficient than what I came with. However, I came up with the below code that might be a bit more intuitive. Either one works and will probably help if you're doing stuff like this a lot. Thanks for all the help and, of course, I welcome even more solutions!
set.seed(23424)
df <- data.frame(var1 = runif(5, 0, 5),
var2 = runif(5, 0, 5),
var3 = runif(5, 0, 5),
var4 = runif(5, 0, 5),
var5 = runif(5, 0, 5))
l <- as.list(df[,1:2],
df[,3:5])
vars <- c("mean1", "mean2")
means <- lapply(l, mean)
for(i in 1:length(means)){
df[,6:7] <- l[i]
names(df)[c(6:7)] <- vars
}