I'm working on a logistic regression assignment and my professor has this code example.
What is the new_x
variable and why are we transforming it as a matrix?
data = pd.DataFrame( {’id’: [ 1,2,3,4,5,6,7,8], ’Label’: [’green’, ’green’, ’green’, ’green’,
’red’, ’red’, ’red’, ’red’],
’Height’: [5, 5.5, 5.33, 5.75, 6.00, 5.92, 5.58, 5.92],
’Weight’: [100, 150, 130, 150, 180, 190, 170, 165], ’Foot’: [6, 8, 7, 9, 13, 11, 12, 10]},
columns = [’id’, ’Height’, ’Weight’, ’Foot’, ’Label’] )
X = data[[’Height’, ’Weight’]].values
scaler = StandardScaler()
scaler.fit(X)
X = scaler.transform(X)
Y = data[’Label’].values
log_reg_classifier = LogisticRegression()
log_reg_classifier.fit(X,Y)
new_x = scaler.transform(np.asmatrix([6, 160]))
predicted = log_reg_classifier.predict(new_x)
accuracy = log_reg_classifier.score(X, Y)
CodePudding user response:
Let's take it step by step.
data = pd.DataFrame( {’id’: [ 1,2,3,4,5,6,7,8], ’Label’: [’green’, ’green’, ’green’, ’green’,
’red’, ’red’, ’red’, ’red’],
’Height’: [5, 5.5, 5.33, 5.75, 6.00, 5.92, 5.58, 5.92],
’Weight’: [100, 150, 130, 150, 180, 190, 170, 165], ’Foot’: [6, 8, 7, 9, 13, 11, 12, 10]},
columns = [’id’, ’Height’, ’Weight’, ’Foot’, ’Label’] )
You create an initial feature matrix that contains the columns [’id’, ’Height’, ’Weight’, ’Foot’, ’Label’]
.
scaler = StandardScaler()
scaler.fit(X)
X = scaler.transform(X)
Y = data[’Label’].values
You than obtain a np.array
, that contains only weight
and height
using data[[’Height’, ’Weight’]].values
. See pandas docs on slicing for more info. You can obtain the size of the feature matrix with X.shape
i. e., [n,2]
.
X = data[[’Height’, ’Weight’]].values
scaler = StandardScaler()
scaler.fit(X)
X = scaler.transform(X)
Y = data[’Label’].values
log_reg_classifier = LogisticRegression()
log_reg_classifier.fit(X,Y)
You use those two features only to train the logistic regression after standardization.
That is your classifier is learned on two features (i. e., height and weight) only, but mutliple samples. Every classifier in sklearn implements the fit()
method to fit the classifier to the training data.
As your model is trained on a feature matrix with two features, your sample that you want to predict (new_x
) also needs two features. Thus, you first create a np.asmatrix([6, 160]
with shape [1,2]
and elements [height=6,weight=160]
, scale it and pass it to your trained model. log_reg_classifier.predict(new_x)
returns the prediction. You assess the performance of the classifier by comparing the prediction with the true label and calculating the (mean) accuracy. Et voila.
new_x = scaler.transform(np.asmatrix([6, 160]))
predicted = log_reg_classifier.predict(new_x)
accuracy = log_reg_classifier.score(X, Y)