How does matplotlib classify the training examples and plot them in the code given below and how doe-CodePudding

This is a classification bases logistic regression problem. X consists of data in the form of :

array([[34.62365962, 78.02469282],
       [30.28671077, 43.89499752],
       [35.84740877, 72.90219803]
       .......

where the first column is score in the first exam and the second column is score in the second exam.

y is a data in which '1' signifies the student got into university, and '0' represents the student did NOT get into university(based on their first and second exam score given in X).

array([0., 0., 0., 1., 1., 0., 1., 1., 1., 1., 0., 0., 1., 1., 0., 1., 1.,
       0., 1., 1., 0., 1., 0., 0., 1., 1., 1., 0., 0., 0., 1., 1., 0., 1.,
       0., 0., 0., 1., 0., 0., 1., 0., 1., 0., 0., 0., 1., 1., 1., 1., 1.,
       1., 1., 0.

The code to plot a following data is :

pos = y == 1
neg = y == 0
# Plot Examples
pyplot.plot(X[pos, 0], X[pos, 1], '*', lw=2, ms=10)
pyplot.plot(X[neg, 0], X[neg, 1], 'o', mfc='y', ms=8, mec='k', mew=1)

I do not understand how does pos = y == 1 and neg = y == 0 work to plot the data as x-axis and y-axis values given by X[pos, 0], X[pos, 1] and X[neg, 0], X[neg, 1]

CodePudding user response：

The == creates a bolean arary i.e an array of True or False, e.g:

import numpy as np
y = np.array([1,0,1,1,0,0])
print(y==1) # [True,False,True,True,False,False]

you can then use this True/False array (also known as a "mask") to filter your array. Say for instance you have the following:

y = np.array([0.5,1,-3,4,0])
pos = y>0 #[True,True,False,True,False]

if you then "slice" using the mask, it only returns the value where the array is True:

y = np.array([0.5,1,-3,4,0])
pos = y>0 #[True,True,False,True,False]
print(y[pos]) #[0.5,1,4]

With that in mind you can see that

pos = y==1
neg = y==0

creates two masks; one for all the places where y is equal to 1, and one for all the places where y is equal to 0.

Since the first row of X corresponds to the first row of y we can use the mask to slice both X and y e.g

pos = y==1
neg = y==0

X_pos = X[pos] #All the x-values where y==1
y_pos = y[pos] # All the y-values where y==1

X_neg = X[neg] #All the x-values where y==0
y_neg = y[neg] # All the y-values where y==0

and then you can plot those two groups.

They have done it implicit just by using

X[pos,0] #correponds to first column in our X_pos above
X[neg,0] #correponds to first column in our X_neg above

CodePudding user response：

I'm not sure I understood the question, but I'll try answering..

pos = (y == 1)

(I've added the parenthesis so it'll be clearer) What actually happens here, is that pos is assigned with something of this form:

[True, True, False, .....]

where each True correspond to an index which y had 1 in, same goes for neg only for 0.

Now when you use X[pos, 0], it returns an array only consist of the lines of X when pos had a True value in the corresponding index, therefore - X[pos, 0] is the X axis of passed students and X[pos, 1] is the Y axis of passed students