Home > Mobile >  Can't predict the result. Get wrong result in Multi label classification with OneVsRestClassifi
Can't predict the result. Get wrong result in Multi label classification with OneVsRestClassifi


Multi Label Classification

gender  age weight  height  vitamin_A   vitamin_C   vitamin_D
0       55  64      128     0           1           0
0       54  72      135     0           1           0
0       82  70      150     1           1           1
0       82  70      150     1           1           1
0       59  64      107     0           1           0

features are gender, age, weight, height

labels are vitamin A, C, D

X = df[['gender', 'age', 'weight', 'height']]
y = df[['vitamin_A', 'vitamin_C', 'vitamin_D']]

I did simple multi label classification model by OneVsRestClassifier.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True)
LR_pipeline = Pipeline([('clf', OneVsRestClassifier(LogisticRegression(solver='sag'), n_jobs=-1))])
labels = ['vitamin_A', 'vitamin_C', 'vitamin_D']
acc_classifier = []

for label in labels:
    LR_pipeline.fit(X_train, y_train[label])
    prediction = LR_pipeline.predict(X_test)
    acc = accuracy_score(y_test[label], prediction)

df_ = pd.DataFrame({'Label': labels, 'Accuracy': acc_classifier})

    Label      Accuracy
0   vitamin_A   0.75
1   vitamin_C   0.65
2   vitamin_D   1.00

The original code is in code link. The data is in data link.

But I do not know how to use the model. I did it but it seem wrong result. Because every time I tried it get same only and 1,1,1 only.

data_test = [[0, 82, 70, 150]] 
for label in labels:
     y_predict = LR_pipeline.predict(data_test)

result is [1][1][1] every time even change number.

My expert is:

Input: gender=0, age=55, weight=64, height=128

Result1: vitamin A is 0, vitamin C is 1, vitamin D is 0

Result2: vitamin A is 0.64, vitamin C is 0.82, vitamin D is 0.34

vitamin_A vitamin_B vitamin_C vitamin_A_prob vitamin_B_prob vitamin_C_prob
0         1         0         0.64           0.82           0.34    

CodePudding user response:

First, you're fitting the same model multiple times. The fit method reinitializes the model, discarding any previously trained parameters.

Second, the provided dataset is not multi-label, because there is a "2" for a row of column "vitamin_A". Supposing this is just a typo, you can directly use OneVsRestClassifier.fit on the whole dataset, no need for fitting for each label. Just run:

LR_pipeline.fit(X_train, y_train)
prediction = LR_pipeline.predict(X_test)
subset_accuracy = accuracy_score(y_test,prediction)
accuracy_per_label = [accuracy_score(y_test[l],prediction[:,i]) for i,l in enumerate(labels)]

OneVsRestClassifier does what your training loop is doing: training one binary classifier for each label separately as it if was a binary classification problem. OneVsRestClassifier is more commonly called Binary Relevance method.

CodePudding user response:

Turn pandas output data frame to numpy y_test.to_numpy() = (samples, n_classes) and do a direct fit; No need for looping each category. This works for me, test inputs are not the same

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.multiclass import OneVsRestClassifier

df = pd.read_csv("vitamin.csv")

X = df[['gender', 'age', 'weight', 'height']]
y = df[['vitamin_A', 'vitamin_C', 'vitamin_D']]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, shuffle=True, random_state=42)
clf = OneVsRestClassifier(LogisticRegression(solver='sag')).fit(X_test.to_numpy(), y_test.to_numpy())

data_test1 = [[0, 57, 79, 145]] #  
data_test2 = [[0, 59, 64, 107]] # 
data_test3= [[0, 89, 74, 107]] # 
y_predict1 = clf.predict(data_test1)
y_predict2 = clf.predict(data_test2)
y_predict3 = clf.predict(data_test3)
[1 1 0]
[0 0 0]
[0 0 0]
  • Related