Home > Back-end >  R: Why is a free variable within a function recognized as an unquoted column name?
R: Why is a free variable within a function recognized as an unquoted column name?

Time:08-05

My understanding is that R's scoping will always try to assign values to free variables within a function by searching the environment within which the function is defined and then searching parent environments. However, I am seeking assistance reconciling this with why I don't receive an error from a function call.

Suppose I define a function foo in the global environment and pass it arguments that are either objects (e.g., a data.frame) in the global environment or the unquoted names of elements of that object.

library(dplyr)

# Example input objects
dv <- "c"
df <- data.frame(x = rep(c(3,NA_real_), 5),
                 y = letters[1:10],
                 z = 1:10)

# Define a function
foo <- function(df, dv, response, treat) {
  df %>%
    filter(y %in% dv) %>%
    filter(!is.na(response)) %>%
    select(treat)
}

My understanding is that y is a free variable here and I should expect R will look for y in the global environment where foo was defined, find nothing, and throw an error. However, any errors/warnings are unrelated to y:

foo(df = df, dv = dv, response = x, treat = z)
#> Error in `filter()`:
#> ! Problem while computing `..1 = !is.na(response)`.
#> Caused by error in `mask$eval_all_filter()`:
#> ! object 'x' not found

While we can fix those scoping errors by quoting and unquoting (per below), it remains unclear to me how y is recognized as an unquoted column name and not producing an error.

foo_new <- function(df, dv, response, treat) {
  response <- enquo(response)
  treat <- enquo(treat)
  
  df %>%
    filter(y %in% dv) %>%
    filter(!is.na(!!response)) %>%
    select(!!treat)
}

foo_new(df, dv, x, z)
#>   z
#> 1 3

CodePudding user response:

It might help to make things more explicit, in regards to quoted vs. unquoted expressions and the environments from where objects are coming. If I were to roll up foo into an R package, this is what I'd do (using roxygen2 comments to make the type of function arguments explicit).

#' Test function
#' 
#' @param df A `data.frame`.
#' @param dv A `character` scalar.
#' @param response An unquoted expression corresponding to a column in `df`.
#' @param treat An unquoted expression corresponding to a column in `df`.
#' 
#' @importFrom magrittr "%>%"
#' @importFrom rlang .data 
foo_explicit <- function(df, dv, response, treat) {
    df %>%
        filter(.data$y %in% dv) %>%
        filter(!is.na({{ response }})) %>%
        select({{ treat }})
}

A few comments:

  • .data$y inside filter makes it explicit that y is a column within df.
  • The dv argument is a character scalar within the foo_explicit environment.
  • The response and treat arguments are unquoted expressions. The curly-curly operator is just a short-cut to the enquo !! construct that you use in foo_new.
  • Related