Home > Enterprise >  Apply logistic regression in a function in R
Apply logistic regression in a function in R

Time:06-29

I want to run logistic regression for multiple parameters and store the different metrics i.e AUC. I wrote the function below but I get an error when I call it: Error in eval(predvars, data, env) : object 'X0' not found even if the variable exists in both my training and testing dataset. Any idea?

new.function <- function(a) {
  model = glm(extry~a,family=binomial("logit"),data = train_df)
  pred.prob <- predict(model,test_df, type='response')
  predictFull <- prediction(pred.prob, test_df$extry)
  auc_ROCR <- performance(predictFull, measure = "auc")

  my_list <- list("AUC" =  auc_ROCR)
  return(my_list) 
}

# Call the function new.function supplying 6 as an argument.
les <- new.function(X0)

CodePudding user response:

The main reason why your function didn't work is that you are trying to call an object into a formula. You can fix it with paste formula function, but that is ultimately quite limiting.

I suggest instead that you consider using update. This allow you more flexibility to change with multiple variable combination, or change a training dataset, without breaking the function.

model = glm(extry~a,family=binomial("logit"),data = train_df)
new.model = update(model, .~X0)


new.function <- function(model){
  pred.prob <- predict(model, test_df, type='response')
  predictFull <- prediction(pred.prob, test_df$extry)
  auc_ROCR <- performance(predictFull, measure = "auc")

  my_list <- list("AUC" =  auc_ROCR)
  return(my_list) 
}


les <- new.function(new.model)

The function can be further improved by calling the test_df as a separate argument, so that you can fit it with an alternative testing data.

CodePudding user response:

To run the function in the way you intended, you would need to use non-standard evaluation to capture the symbol and insert it in a formula. This can be done using match.call and as.formula. Here's a fully reproducible example using dummy data:

new.function <- function(a) {
  
  # Convert symbol to character
  a <- as.character(match.call()$a)
  
  # Build formula from character strings
  form <- as.formula(paste("extry", a, sep = "~"))
  
  model <- glm(form, family = binomial("logit"), data = train_df)
  pred.prob <- predict(model, test_df, type = 'response')
  predictFull <- ROCR::prediction(pred.prob, test_df$extry)
  auc_ROCR <- ROCR::performance(predictFull, "auc")

  list("AUC" =  auc_ROCR)
}

Now we can call the function in the way you intended:

new.function(X0)
#> $AUC
#> A performance instance
#>   'Area under the ROC curve'

new.function(X1)
#> $AUC
#> A performance instance
#>   'Area under the ROC curve'

If you want to see the actual area under the curve you would need to do:

new.function(X0)[email protected][[1]]
#> [1] 0.6599759

So you may wish to modify your function so that the list contains [email protected][[1]] rather than auc_ROCR


Data used

set.seed(1)

train_df <- data.frame(X0 = sample(100), X1 = sample(100))
train_df$extry <- rbinom(100, 1, (train_df$X0   train_df$X1)/200)

test_df  <- data.frame(X0 = sample(100), X1 = sample(100))
test_df$extry <- rbinom(100, 1, (test_df$X0   test_df$X1)/200)

Created on 2022-06-29 by the reprex package (v2.0.1)

  • Related