I'm struggling to understand how to use the {{ }}
operator to pass bare variable names in custom functions. I get an error when I use the operator in conjunction with an if
clause.
This function works:
f <- function(.data, .vars=NULL){
require(dplyr)
df = select(.data, {{ .vars }})
print(head(df))
}
f(iris, c(Species, Sepal.Length))
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#> Species Sepal.Length
#> 1 setosa 5.1
#> 2 setosa 4.9
#> 3 setosa 4.7
#> 4 setosa 4.6
#> 5 setosa 5.0
#> 6 setosa 5.4
Created on 2021-12-20 by the reprex package (v2.0.1)
If I try to add an if
clause, it throws an error:
f <- function(.data, .vars=NULL){
require(dplyr)
if(!is.null(.vars)) df = select(.data, {{ .vars }})
else df = .data
print(head(df))
}
f(iris, c(Species, Sepal.Length))
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#> Error in f(iris, c(Species, Sepal.Length)): object 'Species' not found
Created on 2021-12-20 by the reprex package (v2.0.1)
What am I missing?
CodePudding user response:
I think the easiest explanation is that when .vars
is not NULL
, R will interpret the value (in your example: c(Species, Sepal.Length)
) as a vector of variables, and look for these variables in your environment. Since you don't have any variable called Species
, it throws an error.
You can fix it like this:
library(dplyr)
f <- function(.data, .vars = NULL){
vars <- enquo(.vars)
if(!rlang::quo_is_null(vars)) df = select(.data, !!vars)
else df = .data
print(head(df))
}
f(iris)
f(iris, c(Species, Sepal.Length))
Note that {{x}}
is actually a shorthand for !!enquo(x)
.
Elaboration (update)
When you don't use
if
, the only place.vars
is being used is insidedplyr::select(.data, {{.vars}})
. In this context, the variable names in.vars
are interpreted as being variables in the dataframe.data
.When you add the
if
statement, the.vars
is evaluated as being variables in your environment. Since they don't exist in your environment you get an error.
This is called data-masking. Here is a nice article about it.
CodePudding user response:
@jpiversen's answer and explanation are correct, but here's a simpler fix for your function. Instead of looking for the default value of NULL
, just check if .vars
is missing:
library(dplyr)
f <- function(.data, .vars){
if(!missing(.vars)) df = select(.data, {{ .vars }})
else df = .data
print(head(df))
}
f(iris, c(Species, Sepal.Length))
By the way, I also removed require(dplyr)
from your function. It's generally a bad idea to use it in a function, because of the side effect of changing the search list. Use requireNamespace("dplyr")
and prefix functions using dplyr::
if you're not sure it will be available.