I've been trying to implement a support vector machine algorithm using scikit-learn and after doing some measurements all the scores provide the same values.
x = df["Text"]
y = df["Mood"]
test_size = 5122
x_test = x[:-test_size]
y_test = y[:-test_size]
x_train = x[-test_size:]
y_train = y[-test_size:]
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(x_train)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
x_test = count_vect.transform(x_test).toarray()
SVM = svm.SVC(C=1.0, kernel='linear', degree=3, gamma='auto')
SVM.fit(X_train_tfidf, y_train)
predictions_SVM = SVM.predict(x_test)
print('Accuracy score is: ', accuracy_score(y_test, predictions_SVM))
print('F1 score is: ', f1_score(y_test, predictions_SVM, average='micro'))
print('Precission score is: ', precision_score(y_test, predictions_SVM, average ='micro'))
print('Recall score is: ', recall_score(y_test, predictions_SVM, average='micro'))
Output:
Accuracy score is: 0.9687622022647403
F1 score is: 0.9687622022647403
Precission score is: 0.9687622022647403
Recall score is: 0.9687622022647403
Is this normal or have I made an error somewhere?
CodePudding user response:
Looking at the documentation for these scores, it appears like they should all come out the same when you are using 'micro'.
They are all counting the fraction of times that you get the correct label.
See the examples:
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.recall_score.html
In fact in the last three they all give the same example and of course get the same score.