I am trying to plot the results from a KMeans model on 3 datasets. Code for those is as follows:
blobsX, blobsY = make_blobs(n_samples=1000, n_features=2, random_state=177)
classX, classY = make_classification(n_samples=1000, n_features=2, n_redundant=0,
n_clusters_per_class=1, random_state=177)
circleX, circleY = make_circles(n_samples=1000, noise=0.3, random_state=177)
When I run the models and plot them in separate chunks of code, it works:
kmeans = KMeans(n_clusters=3)
label = kmeans.fit_predict(blobsX)
labels = np.unique(label)
for i in labels:
plt.scatter(blobsX[label == i , 0] , blobsX[label == i , 1] , label = i)
plt.show()
kmeans = KMeans(n_clusters=2)
label = kmeans.fit_predict(classX)
labels2 = np.unique(label)
for i in labels2:
plt.scatter(classX[label == i , 0] , classX[label == i , 1] , label = i)
plt.show()
kmeans = KMeans(n_clusters=2)
label = kmeans.fit_predict(circleX)
labels3 = np.unique(label)
for i in labels3:
plt.scatter(circleX[label == i , 0] , circleX[label == i , 1] , label = i)
plt.show()
When I try to put them all into subplots in the same block of code, only one of the models works properly, with the other 2 breaking:
kmeans = KMeans(n_clusters=3)
label = kmeans.fit_predict(blobsX)
labels = np.unique(label)
kmeans = KMeans(n_clusters=2)
label = kmeans.fit_predict(classX)
labels2 = np.unique(label)
kmeans= KMeans(n_clusters=2)
label = kmeans.fit_predict(circleX)
labels3 = np.unique(label)
fig = plt.figure(figsize=(20,5))
ax = fig.add_subplot(131)
ax2 = fig.add_subplot(132)
ax3 = fig.add_subplot(133)
for i in labels:
ax.scatter(blobsX[label == i , 0] , blobsX[label == i , 1] , label = i)
for j in labels2:
ax2.scatter(classX[label == j , 0] , classX[label == j , 1] , label = j)
for k in labels3:
ax3.scatter(circleX[label == k , 0] , circleX[label == k , 1] , label = k)
plt.show()
Why is this happening and what is the best way to fix it?
CodePudding user response:
The problem is that the matplotlib is considering the clusters are grouped as they are together. This occurs when you not mentioned the colors of the clusters.
The best way to fix this is to give colors for the scatter function individually. Then it would take the color you mentioned in the plot instead of considering all the clusters are same.
Happy coding