Home > Enterprise >  R aggregate variable lengths differ error
R aggregate variable lengths differ error

Time:07-12

Lots of questions on SO with similar titles, but I can't find any that match my circumstances or adapt them to resolve my error. From what I understand there is an issue with object lengths, but I don't understand why?

I'm looking for a base R solution, to calculate the means of multiple columns in a dataframe. It's complicated because this is to use within a larger function and (a) the names and numbers of columns may vary, and (b) the names and numbers of grouping variable(s) will vary. I keep getting the variable lengths differ (found for 'Group') error, perhaps I need a different way to specify the columns to aggregate?

# Example data
df <- data.frame("Location" = rep(LETTERS[1:16], each = 100), 
                 "Group" = sample(1:200, size = 1600, replace = TRUE), 
                 "Type" = rep(rep(c("Big", "Small"), each = 100), times = 8), 
                 "Var.1" = rnorm(1600, mean = 10), 
                 "Var.2" = rnorm(1600, mean = 5), 
                 "Var.3" = rnorm(1600, mean = 42), 
                 "Var.4" = rnorm(1600, mean = 250))

# Direct call to aggregate, works as expected, returns means of the Var columns.
df.means <- aggregate(cbind(Var.1, Var.2, Var.3, Var.4) ~ Group   Type, 
            data = df, FUN = mean)


## More flexible approach not working...

# Create a string identifying the column names for aggregate,
# needs to be flexible as length(df) is variable.
cols.to.agg <- noquote(paste(colnames(df)[4:length(df)], collapse = " , "))

# Grouping variable, here is it just the one column "Type", 
# but cannot assume this is fixed.
grouping.col <- noquote(colnames(df)[3])

# Couple of approaches, but they fail with 
# "variable lengths differ (found for 'Group')"
df.means <- aggregate(cbind(cols.to.agg) ~ Group   grouping.col,
            data = df, FUN = mean)
df.means <- aggregate(as.formula(paste0("cbind(Cols.to.agg) ~ Group
              ", grouping.col)), data = df, FUN = mean)

So, I'm looking to return df.means but with flexibility in names and numbers of columns.

CodePudding user response:

I wouldn't use noquote and just concatenate the column names as strings and then change it to a formula.

cols.to.agg <- colnames(df)[4:length(df)]
grouping.col <- colnames(df)[3]
form <- paste0('cbind(', 
               paste(cols.to.agg, collapse=', '), 
               ') ~ Group   ',
               paste(grouping.col, collapse='   '))
df.means <- aggregate(as.formula(form), data = df, FUN = mean)

CodePudding user response:

You could just use a formula.

cols.to.agg <- sprintf('cbind(%s)', paste(colnames(df)[4:length(df)], collapse=", "))
grouping.col <- colnames(df)[3]
(fo <- as.formula(paste(paste(cols.to.agg, grouping.col, sep=' ~ '), '  Group')))
# cbind(Var.1, Var.2, Var.3, Var.4) ~ Type   Group

aggregate(fo, data=df, FUN=mean) |> head()
#    Type Group     Var.1    Var.2    Var.3    Var.4
# 1   Big     1  8.768814 4.766736 42.32862 249.5582
# 2 Small     1  9.687069 4.355155 41.20841 249.8893
# 3   Big     2  9.890003 5.057623 42.08073 249.6100
# 4 Small     2  9.150614 5.551472 41.12499 249.3057
# 5   Big     3 10.069426 4.562794 41.51709 249.3241
# 6 Small     3  9.993231 4.527602 42.07376 249.9811
  • Related