Home > Enterprise >  scoping/non-standard evaluation issue in glm's formula in a function in R
scoping/non-standard evaluation issue in glm's formula in a function in R

Time:11-30

I have a function that computes a table and a model (and more...):

fun <- function(x, y, formula = y ~ x, data = NULL) {
  out <- list()
  out$tab <- table(x, y)
  out$mod <- glm(formula = formula,
                 family = binomial,
                 data = data)
  out

}

In the formula, I need to use x and y as provided in the function call (e.g. x = DF1$x and y = DF1$y) and variables from another data frame (e.g. a and b from DF2). It fails with my naive function:

fun(x = DF1$x,
    y = DF1$y,
    formula = y ~ x   a   b,
    data = DF2)
# Error in eval(predvars, data, env) : object 'y' not found

How can I make glm search x and y from the function environment? I guess this issue is related to non-standard evaluation and/or scoping, but I have no idea how to fix it.

Data for the example:

smp <- function(x = c(TRUE, FALSE),
                size = 1e2) {
  sample(x = x,
         size = size,
         replace = TRUE)
  }

DF1 <- data.frame(x = smp(),
                  y = smp())

DF2 <- data.frame(a = smp(x = LETTERS),
                  b = smp(x = LETTERS))

CodePudding user response:

Why not just add x and y into data in the function?

fun <- function(x, y, formula = y ~ x, data = NULL) {
  if(length(x) != length(y) | 
     length(x) != nrow(data) | 
     length(y) != nrow(data))stop("x, y and data need to be the same length.\n")
  data$x <- x
  data$y <- y
  out <- list()
  out$tab <- table(x, y)
  out$mod <- glm(formula = formula,
                 family = binomial,
                 data = data)
  out
}

fun(x = DF1$x,
    y = DF1$y,
    formula = y ~ x   a   b,
    data = DF2)
# $tab
# y
# x       FALSE TRUE
# FALSE    27   29
# TRUE     21   23
# 
# $mod
# Call:  glm(formula = formula, family = binomial, data = data)
# 
# Coefficients:
#   (Intercept)        xTRUE           aB           aC           aD           aE           aF           aG           aH           aI           aJ  
# 3.2761      -1.8197       0.3409     -93.9103      -2.0697      20.6813     -41.5963      -1.1078      18.5921      -1.0857     -36.5442  
# aK           aL           aM           aN           aO           aP           aQ           aR           aS           aT           aU  
# -0.5730     -92.5513      -3.0672      22.8989     -53.6200      -0.9450       0.4626      -3.0672       0.3570     -22.8857       1.8867  
# aV           aW           aX           aY           aZ           bB           bC           bD           bE           bF           bG  
# 2.5307      19.5447     -90.5693    -134.0656      -2.5943      -1.2333      20.7726     110.6790      17.1022      -0.5279      -1.2537  
# bH           bI           bJ           bK           bL           bM           bN           bO           bP           bQ           bR  
# -21.7750     114.0199      20.3766     -42.5031      41.1757     -24.3553      -2.0310     -25.9223      -2.9145      51.2537      70.2707  
# bS           bT           bU           bV           bW           bX           bY           bZ  
# -4.7728      -3.7300      -2.0333      -0.3906      -0.5717      -4.0728       0.8155      -4.4021  
# 
# Degrees of Freedom: 99 Total (i.e. Null);  48 Residual
# Null Deviance:        138.5 
# Residual Deviance: 57.73  AIC: 161.7
# 
# Warning message:
#   glm.fit: fitted probabilities numerically 0 or 1 occurred 
# 

  • Related