Repeat an operation for multiple times and store output from each trial-CodePudding

Dataset (simplified)

data <- data.frame()
data[1,1] <- "NO CB"
data[1,2] <- 1.13
data[1,3] <- 4.56
data[2,1] <- "NO CB"
data[2,2] <- 2.45
data[2,3] <- 7.54
data[3,1] <- "NO CB"
data[3,2] <- 3.56
data[3,3] <- 9.56
data[4,1] <- "NO CB"
data[4,2] <- 3.67
data[4,3] <- 7.89
data[5,1] <- "CB"
data[5,2] <- 1.18
data[5,3] <- 5.85
data[6,1] <- "CB"
data[6,2] <- 2.67
data[6,3] <- 7.86
colnames(data)[1] <- "Group"
colnames(data)[2] <- "Region.1"
colnames(data)[3] <- "Region.2"

In this dataset, I have an unbalanced amount of rows for the 'NO CB' group vs. the 'CB' group. What I want to achieve with my code is to randomly select 2 rows from the 'NO CB' group and use data from the selected 2 rows data from the 'CB' group to train my randomforest model, and make predictions (I know 4 rows in total makes a bad predictive model, in my actual dataset I have hundreds of rows, but only a few are reproduced here for simplicity).

I wrote a function below, I hope to repeat the whole process 500 times: randomly select 2 rows from the 'NO CB' group for 500 times, and each time, repeat the procedure for the randomforest classification, extract the auc value from the test trial and store the auc value for each run.

myfun <- function(){
  wocb.ROI <- subset(data,data$Group=="NO CB")
  wcb.ROI <- subset(data,data$Group=="CB")
  wocb.ROI <- wocb.ROI[sample(nrow(wocb.ROI),2),] # randomly sample 2 from the no cb group
  same.ROI <- rbind(wocb.ROI,wcb.ROI)
  same.ROI <- as.data.frame(same.ROI)
  same.ROI$Group <- as.factor(same.ROI$Group)
  trains.same.ROI <- createDataPartition(
    y = same.ROI$Group,
    p = 0.5, #traindata proportion
    list=F
  )
  
  traindata.same.ROI <- same.ROI[trains.same.ROI,]
  testdata.same.ROI <- same.ROI[-trains.same.ROI,]
  form_cls.same.ROI <- as.formula(
    paste0(
      "Group~",
      paste(colnames(traindata.same.ROI)[2:3],collapse=" ")
    )
  )
  
  
  fit.rf.cls.same.ROI <- randomForest(
    form_cls.same.ROI,
    data = traindata.same.ROI,
    ntree=50, #number of decision tree
    mtry =6,
    importance=T
  )
  
  trainpredprob.same.ROI <- predict(fit.rf.cls.same.ROI,newdata=traindata.same.ROI,type="prob")
  trainroc.same.ROI <- roc(response=traindata.same.ROI$Group,
                           predictor=trainpredprob.same.ROI[,2])
  
  bestp.same.ROI <- trainroc.same.ROI$thresholds[
    which.max(trainroc.same.ROI$sensitivities   trainroc.same.ROI$specificities -1)]
  
  trainpredlab.same.ROI <- as.factor(
    ifelse(trainpredprob.same.ROI[,2] >bestp.same.ROI, "No CB","CB")
  )
  
  testpredprob.same.ROI <- predict(fit.rf.cls.same.ROI,newdata=testdata.same.ROI,type="prob")
  testpredlab.same.ROI <- as.factor(
    ifelse(testpredprob.same.ROI[,2] >bestp.same.ROI,"No CB","CB")
  )
  
  testroc.same.ROI <- roc(response=testdata.same.ROI$Group,
                          predictor = testpredprob.same.ROI[,2])

 auc <- testroc.same.ROI$auc
  return(auc)
}

I then tried result <- replicate(500, myfun) but all I got was my code, instead of a dataframe containing the auc value.

I also tried to write loops but I am a bit clueless about how should I adjust my code to make it run.

I have checked similar posts, in fact, repeating the function for 500 times was inspired by one of the similar posts, but my problem still could not be solved. May I ask why my result does not return the auc values but the complete code?

How should I adapt my code to repeat the whole process many times? Thanks in advance for your help!

CodePudding user response：

A solution would be to use one of the apply functions, like lapply. This way you can also keep track of how many times you have run the function, and in the end-result see which run gave which output.

myfun <- function(i) {
  message("Randomforest run ", i)
  # do whatever you need to calculate 'auc' here
  return(auc)
}

res <- lapply(1:500, myfun)