This piece of code is for plotting a series of data by coloring by the classes they belong to. X_train
is an array (115,2)
and Y_train
is another array (115,)
with their respective scope values. My question is what does [Y_train == i]
do exactly?
colors = ["red", "greenyellow", "blue"]
for i in range(len(colors)):
xs = X_train[:, 0][Y_train == i]
ys = X_train[:,1][Y_train == i]
plt.scatter(xs, ys, c = colors[i])
plt.legend(iris.target_names)
plt.xlabel("Sepal length")
plt.ylabel("Sepal width")
CodePudding user response:
Boolean values in python are just subclasses of integers.
Y_train == i
just evaluates into either False
or True
, which is then used to access either index 0
or 1
respectively.
>>> a = ['this string is at index 0', 'this string is at index 1']
>>> a[True]
'this string is at index 1'
>>> a[False]
'this string is at index 0'
>>> a[1 2 == 3] # true
'this string is at index 1'
CodePudding user response:
When you perform a comparison in NumPy, such as Y_train == i
the result is a boolean mask, that is an array containing True
for every entry in the array when the value matches i
, and False
for every other value.
So, for example, with a simple array like:
y = np.array([1,2,1,3])
If you look at y == 1
the result is:
array([True, False, True, False])
In the case of a simple array:
x = np.array([[1,2],[3,4],[5,6],[7,8]])
x
Out[10]:
array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
You are first slicing one column at a time, for example:
x[:, 0]
Out[11]: array([1, 3, 5, 7])
And then applying the boolean mask, which returns only the values in that column that also have True
in the y == 1
boolean mask:
x[:, 0][y == 1]
Out[14]: array([1, 5])
So the above has exactly the same result as:
x[:, 0][[True, False, True, False]]
Out[16]: array([1, 5])