How to take each combination from two lists?-CodePudding

I take my available data and I filter it according to some criteria (remove rows according to a certain value of a column). Then I train a model according to this data. Later on, I take the same data again from the start, but this time I test the model using the same criteria I used before or I use different criteria. Then I do ROC and waterfall plots. My problem is, I want to take each combination from two lists. So for example:

list1 = list(c('a','b','c'),c('A','B','C'))
list2 = list(c('x','y','z'),c('X','Y','Z'))

I want a for loop to run the analysis with c('a','b','c') and c('x','y','z'), and then c('a','b','c') and c('X','Y','Z'). Continue after that to c('A','B','C') and c('x','y','z'), and finally c('A','B','C') and c('X','Y','Z').

This is my code. Now I know you may say use_train and use_test are the same. They will not stay the same, this is just for now. And it is easier for me to handle two lists instead of one. Here every model and every plot are stored in the lists that I create before the for-loop. Should I make a for-loop inside a for-loop maybe?

use_train = list(c('CR','PR','SD'),c('CR','PR','SD','PD')) # criteria used to train the ML model
use_test = list(c('CR','PR','SD'), c('CR','PR','SD','PD')) # criteria used to test the ML model

xgb_models = auc_test = auc_test_plot = data_list = waterfall = list() 

for(i in 1:length(use_train)){
  
  data_list[[i]] = create_data(mydata,metadata, 
                                  recist.use = use_train[[i]], case = 'CR', use_batch = FALSE, seed=40)
  
  xgb_models[[i]] = train_ici(data_list[[i]])
  #parallelStop()
  
  auc_test[[i]] = evaluate_model(xgb_models[[i]], mydata, metadata, 
                         recist.use = use_test[[i]], case = 'CR' , use_batch = FALSE, seed = 40)
  
  auc_test_plot[[i]] = evaluate_model_plot(xgb_models[[i]], data_list[[i]][[2]])
  
  waterfall[[i]] = waterfall(xgb_models[[i]], metadata, data_list[[i]][[2]], case  = 'CR',
                                train.recist = use_train[[i]], test.recist = use_test[[i]])
}

so at the end, I'll have 4 rounds:

from use_train: c('CR','PR','SD') and from use_test: c('CR','PR','SD')
from use_train: c('CR','PR','SD') and from use_test: c('CR','PR','SD','PD')
from use_train: c('CR','PR','SD','PD') and from use_test: c('CR','PR','SD')
from use_train: c('CR','PR','SD','PD') and from use_test: c('CR','PR','SD','PD').

CodePudding user response：

One tries to avoid using for loops in R if the problem is embarrassingly parallalizable. Instead, you can create a data frame holding all combinations created by expand.grid() and create an additional column with the corresponding result:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

list1 <-list(c('a','b','c'),c('A','B','C'))
list2 <- list(c('x','y','z'),c('X','Y','Z'))

# do some stuff with the 2 vars
do_stuff <- function(l1, l2) {
  length(l1)   length(l2)   runif(1)
}

expand.grid(list1, list2) |>
  rowwise() |>
  mutate(result = do_stuff(Var1, Var2))
#> # A tibble: 4 × 3
#> # Rowwise: 
#>   Var1      Var2      result
#>   <list>    <list>     <dbl>
#> 1 <chr [3]> <chr [3]>   6.43
#> 2 <chr [3]> <chr [3]>   6.91
#> 3 <chr [3]> <chr [3]>   6.26
#> 4 <chr [3]> <chr [3]>   6.08

^{Created on 2023-01-07 by the reprex package (v2.0.1)}

CodePudding user response：

This is my proposition. I use lapply and unlist to create a list of lists. then lapply instead of appending to each list.

use_train <- list(c('CR','PR','SD'),c('CR','PR','SD','PD')) # criteria used to train the ML model
use_test <- list(c('CR','PR','SD'), c('CR','PR','SD','PD')) # criteria used to test the ML model

train_test <- unlist(lapply(use_train, \(x) lapply(use_test, \(y) list(
  train=x,test=y))), F)

output = lapply(train_test, function(tt){
  data_list <- create_data(mydata,metadata, 
           recist.use = tt$train, case = 'CR', use_batch = FALSE, seed=40)
  
  xgb_models <- train_ici(data_list)

  auc_test <- evaluate_model(
    xgb_models, mydata, metadata, 
    recist.use = tt$test, case = 'CR' , use_batch = FALSE, seed = 40)
  
  auc_test_plot <- evaluate_model_plot(
    xgb_models, data_list[[2]])
  
  waterfall <- waterfall(
    xgb_models, metadata, data_list[[2]], case  = 'CR',
    train.recist = tt$train, test.recist = tt$test)
  
  return(list(
    data_list = data_list, xgb_models = xgb_models, auc_test = auc_test,
    auc_test_plot = auc_test_plot, waterfall = waterfall))
})

output