Home > Software engineering >  Is there a way to stack models trained with different data sets with the stacks package in R?
Is there a way to stack models trained with different data sets with the stacks package in R?

Time:12-15

Briefly, I am working with data sets from two different countries. My aim is to ensemble the models for both countries to see how generalizable the ensemble becomes

My set-up is: I have trained one worfklow_set for each country (10 model specifications with resampling and a grid search of size 20).

This is the error I get when trying to add them as candidates:

predictions <- stacks() %>% 
  add_candidates(wf_set_1) %>% 
  add_candidates(wf_set_2)

Error: It seems like the new candidate member 'Logistic Regression' doesn't make use of the same resampling object as the existing candidates.

CodePudding user response:

Thanks for the question!

Unfortunately, we don't support ensembling models trained on different data sets in stacks. There are a few operations that are no longer well-defined when this is the case.

Given your description of the problem, though, this sounds like a setting where, rather than fitting a model for each country, the country would be included as a feature in one model that fits across countries. For any covariates x_i whose effect you feel may be dependent on country, you can create an interaction term with step_interact(x_i, country).

  • Related