I take my available data and I filter it according to some criteria (remove rows according to a certain value of a column). Then I train a model according to this data. Later on, I take the same data again from the start, but this time I test the model using the same criteria I used before or I use different criteria. Then I do ROC and waterfall plots. My problem is, I want to take each combination from two lists. So for example:
list1 = list(c('a','b','c'),c('A','B','C'))
list2 = list(c('x','y','z'),c('X','Y','Z'))
I want a for loop to run the analysis with c('a','b','c')
and c('x','y','z')
, and then c('a','b','c')
and c('X','Y','Z')
. Continue after that to c('A','B','C')
and c('x','y','z')
, and finally c('A','B','C')
and c('X','Y','Z')
.
This is my code. Now I know you may say use_train
and use_test
are the same. They will not stay the same, this is just for now. And it is easier for me to handle two lists instead of one. Here every model and every plot are stored in the lists that I create before the for-loop. Should I make a for-loop inside a for-loop maybe?
use_train = list(c('CR','PR','SD'),c('CR','PR','SD','PD')) # criteria used to train the ML model
use_test = list(c('CR','PR','SD'), c('CR','PR','SD','PD')) # criteria used to test the ML model
xgb_models = auc_test = auc_test_plot = data_list = waterfall = list()
for(i in 1:length(use_train)){
data_list[[i]] = create_data(mydata,metadata,
recist.use = use_train[[i]], case = 'CR', use_batch = FALSE, seed=40)
xgb_models[[i]] = train_ici(data_list[[i]])
#parallelStop()
auc_test[[i]] = evaluate_model(xgb_models[[i]], mydata, metadata,
recist.use = use_test[[i]], case = 'CR' , use_batch = FALSE, seed = 40)
auc_test_plot[[i]] = evaluate_model_plot(xgb_models[[i]], data_list[[i]][[2]])
waterfall[[i]] = waterfall(xgb_models[[i]], metadata, data_list[[i]][[2]], case = 'CR',
train.recist = use_train[[i]], test.recist = use_test[[i]])
}
so at the end, I'll have 4 rounds:
- from
use_train
:c('CR','PR','SD')
and fromuse_test
:c('CR','PR','SD')
- from
use_train
:c('CR','PR','SD')
and fromuse_test
:c('CR','PR','SD','PD')
- from
use_train
:c('CR','PR','SD','PD')
and fromuse_test
:c('CR','PR','SD')
- from
use_train
:c('CR','PR','SD','PD')
and fromuse_test
:c('CR','PR','SD','PD')
.
CodePudding user response:
One tries to avoid using for loops in R if the problem is embarrassingly parallalizable. Instead, you can create a data frame holding all combinations created by expand.grid()
and create an additional column with the corresponding result:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
list1 <-list(c('a','b','c'),c('A','B','C'))
list2 <- list(c('x','y','z'),c('X','Y','Z'))
# do some stuff with the 2 vars
do_stuff <- function(l1, l2) {
length(l1) length(l2) runif(1)
}
expand.grid(list1, list2) |>
rowwise() |>
mutate(result = do_stuff(Var1, Var2))
#> # A tibble: 4 × 3
#> # Rowwise:
#> Var1 Var2 result
#> <list> <list> <dbl>
#> 1 <chr [3]> <chr [3]> 6.43
#> 2 <chr [3]> <chr [3]> 6.91
#> 3 <chr [3]> <chr [3]> 6.26
#> 4 <chr [3]> <chr [3]> 6.08
Created on 2023-01-07 by the reprex package (v2.0.1)
CodePudding user response:
This is my proposition. I use lapply and unlist to create a list of lists. then lapply instead of appending to each list.
use_train <- list(c('CR','PR','SD'),c('CR','PR','SD','PD')) # criteria used to train the ML model
use_test <- list(c('CR','PR','SD'), c('CR','PR','SD','PD')) # criteria used to test the ML model
train_test <- unlist(lapply(use_train, \(x) lapply(use_test, \(y) list(
train=x,test=y))), F)
output = lapply(train_test, function(tt){
data_list <- create_data(mydata,metadata,
recist.use = tt$train, case = 'CR', use_batch = FALSE, seed=40)
xgb_models <- train_ici(data_list)
auc_test <- evaluate_model(
xgb_models, mydata, metadata,
recist.use = tt$test, case = 'CR' , use_batch = FALSE, seed = 40)
auc_test_plot <- evaluate_model_plot(
xgb_models, data_list[[2]])
waterfall <- waterfall(
xgb_models, metadata, data_list[[2]], case = 'CR',
train.recist = tt$train, test.recist = tt$test)
return(list(
data_list = data_list, xgb_models = xgb_models, auc_test = auc_test,
auc_test_plot = auc_test_plot, waterfall = waterfall))
})
output