Home > Enterprise >  How to use a variable name in a formula instead of the column itself
How to use a variable name in a formula instead of the column itself

Time:10-18

I have data for which I would like to make a summary by group using the summary_by function (from the doBy package). I can't use the column names in the summary_by formula but variables I created before.
Below is the result I would like to achieve :

library(data.table)
library(doBy)

mtcars = data.table(mtcars)

doBy::summary_by(data = mtcars, mpg ~ gear   am, FUN = "mean")

output:

gear  am   mpg."mean"
3     0    16.10667
4     0    21.05000
4     1    26.27500
5     1    21.38000

Here is what I want to do :

library(data.table)
library(doBy)

mtcars = data.table(mtcars)

variable1 = "gear" # which is a column name of mtcars
variable2 = "am" # which is a column name of mtcars
variable3 = "mpg" # which is a column name of mtcars

doBy::summary_by(data = mtcars, variable3 ~ variable1   variable2 , FUN = "mean")

I tried with the functions get, assign, eval, mget but I don't find the solution.

CodePudding user response:

Just provide a string instead of a formula that relies on non-standard evaluation.

library(data.table)
library(doBy)

mtcars = data.table(mtcars)

variable1 = "gear" # which is a column name of mtcars
variable2 = "am" # which is a column name of mtcars
variable3 = "mpg" # which is a column name of mtcars

doBy::summary_by(data = mtcars, 
                 # alternatively to sprintf(), use paste() oder glue()
                 as.formula(sprintf("%s ~ %s   %s", variable3, variable1, variable2)), 
                 FUN = "mean")

CodePudding user response:

Thanks @mnist it works !!

I just find 2 other ways :

library(data.table)
library(doBy)

mtcars = data.table(mtcars)

variable1 = "gear" # which is a column name of mtcars
variable2 = "am" # which is a column name of mtcars
variable3 = "mpg" # which is a column name of mtcars
  • Summary_by solution with reformulate function :

    summary_by(data = mtcars, reformulate(
        termlabels = c(variable1, variable2),
        response = variable3)
    )
    
  • Datatable native way :

    mtcars[, mean(get(variable3)), by = mget(c(variable1, variable2))]
    
  • Related