Home > Back-end >  Error when using dplyr {{ }} with aggregate inside a function
Error when using dplyr {{ }} with aggregate inside a function

Time:11-17

I am trying to use aggregate inside a function by using dplyrs {{ }} notation to select the column to aggregate on.

filter <- function(df, level) {
 df <- aggregate(.~ {{level}}, data=df, FUN=sum) 
 return(df)
}

however I get the error

Error in model.frame.default(formula = cbind(phylum, '12K1B.txt', '12K2B.txt',  :  
variable lengths differ (found for '{{ level }}')

I have double checked my data and there are no missing or NA values and everything works as expected when I run it outside of the function so I am not sure what is causing the error.

CodePudding user response:

{{ }} is tidyverse syntax, and should only work inside tidyverse verbs.

If we want to achieve something like this

aggregate(. ~ Species, data = iris, sum)
     Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1     setosa        250.3       171.4         73.1        12.3
2 versicolor        296.8       138.5        213.0        66.3
3  virginica        329.4       148.7        277.6       101.3

We can make a formula on the fly, manipulating as text like so

aggregate_var <- function(df, level) {
  level <- deparse(substitute(level))
  aggregate(formula(paste(". ~", level)), data=df, FUN=sum) 
}

aggregate_var(iris, Species)
     Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1     setosa        250.3       171.4         73.1        12.3
2 versicolor        296.8       138.5        213.0        66.3
3  virginica        329.4       148.7        277.6       101.3

As an aside - filter is a popular function name, perhaps a more detailed description is useful. Also note that an explicit return statement and the assignment to df are not needed here.

CodePudding user response:

You may use get to achieve this.

funn <- function(df, level){
  df <- aggregate(.~ get(level), data=df, FUN=sum) 
  return(df)
}
funn(iris, "Species")

  get(level) Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1     setosa        250.3       171.4         73.1        12.3      50
2 versicolor        296.8       138.5        213.0        66.3     100
3  virginica        329.4       148.7        277.6       101.3     150

CodePudding user response:

To complement the other answers already provided, if you wanted to use {{, the dplyr way is:

my_fun <- function(df, level) {
  df |>
    group_by({{ level }}) |>
    summarize(across(everything(), sum))
}

my_fun(iris, Species)
# A tibble: 3 x 5
  Species    Sepal.Length Sepal.Width Petal.Length Petal.Width
  <fct>             <dbl>       <dbl>        <dbl>       <dbl>
1 setosa             250.        171.         73.1        12.3
2 versicolor         297.        138.        213          66.3
3 virginica          329.        149.        278.        101. 
  • Related