Lots of questions on SO with similar titles, but I can't find any that match my circumstances or adapt them to resolve my error. From what I understand there is an issue with object lengths, but I don't understand why?
I'm looking for a base R solution, to calculate the means of multiple columns in a dataframe. It's complicated because this is to use within a larger function and (a) the names and numbers of columns may vary, and (b) the names and numbers of grouping variable(s) will vary. I keep getting the variable lengths differ (found for 'Group')
error, perhaps I need a different way to specify the columns to aggregate?
# Example data
df <- data.frame("Location" = rep(LETTERS[1:16], each = 100),
"Group" = sample(1:200, size = 1600, replace = TRUE),
"Type" = rep(rep(c("Big", "Small"), each = 100), times = 8),
"Var.1" = rnorm(1600, mean = 10),
"Var.2" = rnorm(1600, mean = 5),
"Var.3" = rnorm(1600, mean = 42),
"Var.4" = rnorm(1600, mean = 250))
# Direct call to aggregate, works as expected, returns means of the Var columns.
df.means <- aggregate(cbind(Var.1, Var.2, Var.3, Var.4) ~ Group Type,
data = df, FUN = mean)
## More flexible approach not working...
# Create a string identifying the column names for aggregate,
# needs to be flexible as length(df) is variable.
cols.to.agg <- noquote(paste(colnames(df)[4:length(df)], collapse = " , "))
# Grouping variable, here is it just the one column "Type",
# but cannot assume this is fixed.
grouping.col <- noquote(colnames(df)[3])
# Couple of approaches, but they fail with
# "variable lengths differ (found for 'Group')"
df.means <- aggregate(cbind(cols.to.agg) ~ Group grouping.col,
data = df, FUN = mean)
df.means <- aggregate(as.formula(paste0("cbind(Cols.to.agg) ~ Group
", grouping.col)), data = df, FUN = mean)
So, I'm looking to return df.means
but with flexibility in names and numbers of columns.
CodePudding user response:
I wouldn't use noquote
and just concatenate the column names as strings and then change it to a formula.
cols.to.agg <- colnames(df)[4:length(df)]
grouping.col <- colnames(df)[3]
form <- paste0('cbind(',
paste(cols.to.agg, collapse=', '),
') ~ Group ',
paste(grouping.col, collapse=' '))
df.means <- aggregate(as.formula(form), data = df, FUN = mean)
CodePudding user response:
You could just use a formula.
cols.to.agg <- sprintf('cbind(%s)', paste(colnames(df)[4:length(df)], collapse=", "))
grouping.col <- colnames(df)[3]
(fo <- as.formula(paste(paste(cols.to.agg, grouping.col, sep=' ~ '), ' Group')))
# cbind(Var.1, Var.2, Var.3, Var.4) ~ Type Group
aggregate(fo, data=df, FUN=mean) |> head()
# Type Group Var.1 Var.2 Var.3 Var.4
# 1 Big 1 8.768814 4.766736 42.32862 249.5582
# 2 Small 1 9.687069 4.355155 41.20841 249.8893
# 3 Big 2 9.890003 5.057623 42.08073 249.6100
# 4 Small 2 9.150614 5.551472 41.12499 249.3057
# 5 Big 3 10.069426 4.562794 41.51709 249.3241
# 6 Small 3 9.993231 4.527602 42.07376 249.9811