Home > Mobile >  ROC for Logistic regression in R
ROC for Logistic regression in R

Time:03-06

I would like to ask for help with my project. My goal is to get ROC curve from existing logistic regression.

First of all, here is what I'm analyzing.

glm.fit <- glm(Severity_Binary ~ Side   State   Timezone   Temperature.F.   Wind_Chill.F.   Humidity...   Pressure.in.   Visibility.mi.   Wind_Direction   Wind_Speed.mph.   Precipitation.in.   Amenity   Bump   Crossing   Give_Way   Junction   No_Exit   Railway   Station   Stop   Traffic_Calming   Traffic_Signal   Sunrise_Sunset , data = train_data, family = binomial)

glm.probs <- predict(glm.fit,type = "response")

glm.probs = predict(glm.fit, newdata = test_data, type = "response")
glm.pred = ifelse(glm.probs > 0.5, "1", "0")

This part works fine, I am able to show a table of prediction and mean result. But here comes the problem for me, I'm using pROC library, but I am open to use anything else which you can help me with. I'm using test_data with approximately 975 rows, but variable proc has only 3 sensitivities/specificities values.

library(pROC)
proc <- roc(test_data$Severity_Binary,glm.probs) 

test_data$sens <- proc$sensitivities[1:975] 
test_data$spec <- proc$specificities[1:975]

ggplot(test_data, aes(x=spec, y=sens))   geom_line()

Here´s what I have as a result:

enter image description here

With Warning message:

Removed 972 row(s) containing missing values (geom_path).

As I found out, proc has only 3 values as I said.

enter image description here

CodePudding user response:

You can't (and shouldn't) assign the sensitivity and specificity to the data. They are summary data and exist in a different dimension than your data.

Specifically, these two lines are wrong and make no sense at all:

test_data$sens <- proc$sensitivities[1:975] 
test_data$spec <- proc$specificities[1:975]

Instead you must either save them to a new data.frame, or use some of the existing functions like enter image description here

  • Related