Home > Enterprise >  Moving multiple classification model training and test
Moving multiple classification model training and test

Time:09-30

Let's assume I have a two years dataset (24 months), and the business cycle is monthly, so I have to deliver the model scores each month (classification model). The best way to train a model (I think) for this is with this approach:

  • Train months 1-12, test month 13
  • Train months 1-13, test month 14
  • Train months 1-14, test month 15
  • ...
  • Train months 1-23, test month 24

Given this, I would have 12 different results. Is there a name for this kind of training? I'm thinking in doing it by myself, but would be really helpful if actually exist a package or a name to do this kind of stuff and receive as input the ML algorithm, pipeline, or CVsearch I want to try for each training.

If exists a package or a simple way to do this, is possible also to establish a window of 12 months like this?:

  • Train months 1-12, test month 13
  • Train months 2-13, test month 14
  • Train months 3-14, test month 15
  • ...
  • Train months 12-23, test month 24

And if that's possible too, is it possible to put a weight where the latest months will have a "higher weight training" in the model?

CodePudding user response:

In general that would be called rolling cross validation. Scikit-learn has a function for that.

See output from their example:

>>> for train_index, test_index in tscv.split(X):
...     print("TRAIN:", train_index, "TEST:", test_index)
...     X_train, X_test = X[train_index], X[test_index]
...     y_train, y_test = y[train_index], y[test_index]
TRAIN: [0] TEST: [1]
TRAIN: [0 1] TEST: [2]
TRAIN: [0 1 2] TEST: [3]
TRAIN: [0 1 2 3] TEST: [4]
TRAIN: [0 1 2 3 4] TEST: [5]
  • Related