Any help would be appreciated.
I'm trying to plot the ROC curve for 80 columns, the code for this is below:
pred <- prediction(df$x, label)
perf <- performance(pred,"tpr","fpr")
plot(perf,colorize=TRUE)
Now I would like to each time plot of the columns and put it in
df$x
and calculate pred and then perf and then plot it(I have to do it for 80 columns).
Now, This is the code that I wrote and I know its not correct but don't know how to make it work
for (i in 1:ncol(df)){
pred <- prediction(df$x[i], label)
perf <- performance(pred,"tpr","fpr")
plot(perf,colorize=TRUE)}
This is the first few lines of my data:
label x1 x2 x3 x4 x5 x6 x7 x8
1 0 34.96667 41.93333 54.30000 42.93333 24.40000 48.50000 42.73333 33.86667
2 0 79.00000 25.20000 95.43333 75.23333 31.50000 88.96667 83.60000 75.30000
3 0 16.10000 15.80000 17.13333 27.23333 35.10000 18.90000 14.66667 40.00000
4 0 61.90000 23.96667 74.23333 57.23333 45.50000 69.70000 61.80000 58.00000
5 0 31.40000 18.40000 42.16667 41.13333 55.86667 39.90000 32.33333 45.50000
x9 x10 x11 x12 x13 x14 x15 x16
1 11.366667 22983.00 15302.67 111186.67 781.3333 338140.0 2457099 13078.3333
2 20.200000 22750.33 16278.00 118196.67 156.6333 347375.7 2522140 405.6667
3 -10.100000 23812.33 13035.00 90846.67 -1758.3333 371015.7 2583397 11148.6667
4 17.000000 25006.33 16416.67 114940.00 1925.0000 381342.3 2669452 1410.3333
5 1.066667 25351.00 16225.00 110753.33 -152.3667 406208.3 2772717 8366.6667
x17 x18 x19 x20 x21 x22 x23 x24 x25
1 -1674.6667 -1521.033 1674.667 353442.7 2568286 298623.7 12185.000 168.33333 63.86667
2 -2851.3333 -2864.333 2851.333 363654.0 2640337 301270.0 -2288.667 59.33333 56.90000
3 -2305.3333 -2188.333 2305.333 384050.7 2674244 313343.0 7085.000 717.00000 67.33333
4 -2154.6667 -2130.000 2154.667 397758.3 2784392 320309.0 1180.967 167.93333 74.90000
5 -480.6667 -432.000 1313.333 422433.3 2883470 341558.0 7733.333 227.66667 72.60000
This is also the output of dput
structure(list(label = c(0, 0, 0, 0, 0), x1 = c(34.9666666666667,
79, 16.1, 61.9, 31.4), x2 = c(41.9333333333333, 25.2, 15.8, 23.9666666666667,
18.4), x3 = c(54.3, 95.4333333333333, 17.1333333333333, 74.2333333333333,
42.1666666666667), x4 = c(42.9333333333333, 75.2333333333333,
27.2333333333333, 57.2333333333333, 41.1333333333333), x5 = c(24.4,
31.5, 35.1, 45.5, 55.8666666666667)), row.names = c(NA, 5L), class = "data.frame")
Thanks in advance for any help.
CodePudding user response:
I think you can create an empty list, then simply save each plot inside it.
For the results, create an empty matrix/table, and put the result of each iteration in it. It would be like:
viz_list <- list() #this is your empty list
records <- matrix(NA, ncol = 2, nrow = 1) # it will increase with each iteration
for (i in 1:ncol(df)){
pred <- prediction(df$x[i], label)
perf <- performance(pred,"tpr","fpr")
records[i,1] <- pred
records[i,2] <- perf
viz_list[[i]] <- plot(perf,colorize=TRUE)}
CodePudding user response:
Here is complete code to plot all ROC curves of a data set df
with the same structure of the data set in the question. I first first create a data set because the one in the question only as one class (label
is always 0). Then,
- Get the current directory, create a temporary directory to save the graphics files;
- in the
for
loop, compute the predictions and their performance; - open a graphics device with
png()
; - plot the performance, saving it to disk and close the device.
There are now as many "Perf_X?.png" files as variables
"X?"in the data.frame. These
png` file related instructions can be removed but with 80 plots, it's better to saved them and see them one by one later.
library(ROCR)
# Make up a data set
set.seed(2022)
data(ROCR.simple)
df <- do.call(cbind.data.frame, ROCR.simple[2:1])
df <- cbind(df, replicate(5, runif(nrow(df))))
names(df) <- c("label", paste0("X", seq.int(ncol(df) - 1)))
old_dir <- getwd()
TempDir <- tempdir()
dir.exists(TempDir)
#> [1] TRUE
setwd(TempDir)
for (i in seq_len(ncol(df))[-1]){
pred <- prediction(df[[i]], df$label)
perf <- performance(pred, "tpr", "fpr")
# save to PNG file, names are "Perf_%s.png" with
# the format string %s becoming the column name
filename <- sprintf("Perf_%s.png", names(df)[i])
png(filename = filename)
plot_title <- paste("Variable:", names(df)[i])
plot(perf, main = plot_title, colorize = TRUE)
dev.off()
}
plots_vec <- list.files(path = TempDir, pattern = "Perf_.*\\.png")
plots_vec
#> [1] "Perf_X1.png" "Perf_X2.png" "Perf_X3.png" "Perf_X4.png" "Perf_X5.png"
#> [6] "Perf_X6.png"
Created on 2022-03-19 by the reprex package (v2.0.1)
Final clean up
setwd(old_dir)
unlink(file.path(TempDir, plots_vec))