I am trying to calculate the ROC score for my data but it is resulting in nan.
The code:
scoring = 'roc_auc'
kfold= KFold(n_splits=10, random_state=42, shuffle=True)
model = LinearDiscriminantAnalysis()
results = cross_val_score(model, df_n, y, cv=kfold, scoring=scoring)
print("AUC: %.3f (%.3f)" % (results.mean(), results.std()))
df_n is an array from the normalised values, I also tried it just with the X data value from the dataset. y is an array of binary values.
df_n shape: (150, 4) y shape: (150,)
I am stumped, it should work!
CodePudding user response:
The problem is that roc_auc_score
expects the probabilities and not the predictions in the case of multi-class classification. However, with that code the score is getting the output of predict
instead.
Use a new scorer:
from sklearn.metrics import roc_auc_score, make_scorer
multi_roc_scorer = make_scorer(lambda y_in, y_p_in: roc_auc_score(y_in, y_p_in, multi_class='ovr'), needs_proba=True)
scores = cross_validate(model, X_s, y_s, scoring=multi_roc_scorer, cv=cv, error_score="raise")