I would like to use scikit learn to predict with X a variable y. I would like to train a classifier on a training dataset using cross validation and then to apply this classifier to an unseen test dataset (as in https://www.nature.com/articles/s41586-022-04492-9)
from sklearn import datasets
from sklearn.model_selection import cross_validate
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
# Import dataset
X, y = datasets.load_iris(return_X_y=True)
# Create binary variable y
y[y == 0] = 1
# Divide in train and test set
x_train, x_test, y_train, y_test = train_test_split(X, y,test_size=75, random_state=4, stratify=y)
# Cross validation on the train data
cv_model = cross_validate(model, x_train, y_train, cv=5)
Now I would like to use this cross validated model and to apply it to the unseen test set. I am unable to find how.
It would be something like
result = cv_model.score(x_test, y_test)
Except this does not work
CodePudding user response:
You cannot do that; you need to fit the model before using it to predict new data. cross_validate
is just a convenience function to get the scores; as clearly mentioned in the documentation, it returns just that, i.e. scores, and not a (fitted) model:
Evaluate metric(s) by cross-validation and also record fit/score times.
[...]
Returns: scores : dict of float arrays of shape (n_splits,)
Array of scores of the estimator for each run of the cross validation.
A dict of arrays containing the score/time arrays for each scorer is returned.