Apply logistic regression in a function in R-CodePudding

I want to run logistic regression for multiple parameters and store the different metrics i.e AUC. I wrote the function below but I get an error when I call it: Error in eval(predvars, data, env) : object 'X0' not found even if the variable exists in both my training and testing dataset. Any idea?

new.function <- function(a) {
  model = glm(extry~a,family=binomial("logit"),data = train_df)
  pred.prob <- predict(model,test_df, type='response')
  predictFull <- prediction(pred.prob, test_df$extry)
  auc_ROCR <- performance(predictFull, measure = "auc")

  my_list <- list("AUC" =  auc_ROCR)
  return(my_list) 
}

# Call the function new.function supplying 6 as an argument.
les <- new.function(X0)

CodePudding user response：

The main reason why your function didn't work is that you are trying to call an object into a formula. You can fix it with paste formula function, but that is ultimately quite limiting.

I suggest instead that you consider using update. This allow you more flexibility to change with multiple variable combination, or change a training dataset, without breaking the function.

model = glm(extry~a,family=binomial("logit"),data = train_df)
new.model = update(model, .~X0)


new.function <- function(model){
  pred.prob <- predict(model, test_df, type='response')
  predictFull <- prediction(pred.prob, test_df$extry)
  auc_ROCR <- performance(predictFull, measure = "auc")

  my_list <- list("AUC" =  auc_ROCR)
  return(my_list) 
}


les <- new.function(new.model)

The function can be further improved by calling the test_df as a separate argument, so that you can fit it with an alternative testing data.

CodePudding user response：

To run the function in the way you intended, you would need to use non-standard evaluation to capture the symbol and insert it in a formula. This can be done using match.call and as.formula. Here's a fully reproducible example using dummy data:

new.function <- function(a) {
  
  # Convert symbol to character
  a <- as.character(match.call()$a)
  
  # Build formula from character strings
  form <- as.formula(paste("extry", a, sep = "~"))
  
  model <- glm(form, family = binomial("logit"), data = train_df)
  pred.prob <- predict(model, test_df, type = 'response')
  predictFull <- ROCR::prediction(pred.prob, test_df$extry)
  auc_ROCR <- ROCR::performance(predictFull, "auc")

  list("AUC" =  auc_ROCR)
}

Now we can call the function in the way you intended:

new.function(X0)
#> $AUC
#> A performance instance
#>   'Area under the ROC curve'

new.function(X1)
#> $AUC
#> A performance instance
#>   'Area under the ROC curve'

If you want to see the actual area under the curve you would need to do:

new.function(X0)[email protected][[1]]
#> [1] 0.6599759

So you may wish to modify your function so that the list contains [email protected][[1]] rather than auc_ROCR

Data used

set.seed(1)

train_df <- data.frame(X0 = sample(100), X1 = sample(100))
train_df$extry <- rbinom(100, 1, (train_df$X0   train_df$X1)/200)

test_df  <- data.frame(X0 = sample(100), X1 = sample(100))
test_df$extry <- rbinom(100, 1, (test_df$X0   test_df$X1)/200)

^{Created on 2022-06-29 by the reprex package (v2.0.1)}