Home > Software design >  Using sklearn's cross_val_score with different training and testing datasets
Using sklearn's cross_val_score with different training and testing datasets

Time:06-18

I have a quick question about the following short snippet of code (my version of sklearn, from which cross_val_score and LinearDiscriminantAnalysis are imported from, is 1.1.1):

cv_results = cross_val_score(LinearDiscriminantAnalysis(),data,isTarget,cv=kfold,scoring='accuracy')

I am trying to train a LinearDiscriminantAnalysis ML algorithm on the 'data' variable and the 'isTarget' variable, which are numpy arrays of the features of the samples in my ML dataset and a list of which samples are targets (1) or non-targets (0), respectfully. kfold is just a method for scoring the algorithm, it isn't important here.

My question is this: I am trying to score this algorithm by training it on 'data' and 'isTarget', but I would like to test it on a different dataset, 'data_val' and 'isTarget_val,' but cross_val_score does not have parameters for training an algoirithm on one dataset and testing it on another. I've been searching for other functions that will do this, and I feel that it is a really simple answer and I just can't find it.

Can someone help me out? Thanks :)

CodePudding user response:

This is how cross-validation is designed to work. The cv argument you are supplying specifies that you want to do K-Fold cross-validation, which means that the entirety of your dataset will be used for both training and testing in K different folds.

You can read up more on cross-validation here.

  • Related