I'm trying to use a function that calls on the pROC package in R to calculate the area under the curve for a number of different outcomes.
# Function used to compute area under the curve
proc_auc <- function(outcome_var, predictor_var) {
pROC::auc(outcome_var, predictor_var)}
To do this, I am intending to refer to outcome names in a vector (much like below).
# Create a vector of outcome names
outcome <- c('outcome_1', 'outcome_2')
However, I am having problems defining variables to input into this function. When I do this, I generate the error: "Error in roc.default(response, predictor, auc = TRUE, ...): 'response' must have two levels". However, I can't work out why, as I reckon I only have two levels...
I would be so happy if anyone could help me!
Here is a reproducible code from the iris dataset in R.
library(pROC)
library(datasets)
library(dplyr)
# Use iris dataset to generate binary variables needed for function
df <- iris %>% dplyr::mutate(outcome_1 = as.numeric(ntile(Sepal.Length, 4)==4),
outcome_2 = as.numeric(ntile(Petal.Length, 4)==4))%>%
dplyr::rename(predictor_1 = Petal.Width)
# Inspect binary outcome variables
df %>% group_by(outcome_1) %>% summarise(n = n()) %>% mutate(Freq = n/sum(n))
df %>% group_by(outcome_2) %>% summarise(n = n()) %>% mutate(Freq = n/sum(n))
# Function used to compute area under the curve
proc_auc <- function(outcome_var, predictor_var) {
pROC::auc(outcome_var, predictor_var)}
# Create a vector of outcome names
outcome <- c('outcome_1', 'outcome_2')
# Define variables to go into function
outcome_var <- df %>% dplyr::select(outcome[[1]])
predictor_var <- df %>% dplyr::select(predictor_1)
# Use function - first line works but not last line!
proc_auc(df$outcome_1, df$predictor_1)
proc_auc(outcome_var, predictor_var)
CodePudding user response:
outcome_var
and predictor_var
are dataframes with one column which means they cannot be used directly as an argument in the auc
function.
Just specify the column names and it will work.
proc_auc(outcome_var$outcome_1, predictor_var$predictor_1)
CodePudding user response:
You'll have to familiarize yourself with dplyr's non-standard evaluation, which makes it pretty hard to program with. In particular, you need to realize that passing a variable name is an indirection, and that there is a special syntax for it.
If you want to stay with the pipes / non-standard evaluation, you can use the roc_
function which follows a previous naming convention for functions taking variable names as input instead of the actual column names.
proc_auc2 <- function(data, outcome_var, predictor_var) {
pROC::auc(pROC::roc_(data, outcome_var, predictor_var))
}
At this point you can pass the actual column names to this new function:
proc_auc2(df, outcome[[1]], "predictor_1")
# or equivalently:
df %>% proc_auc2(outcome[[1]], "predictor_1")
That being said, for most use cases you probably want to follow @druskacik's answer and use standard R evaluation.