I am working on k-means clustering for customer segmentation. My input data has 12 features and 7315 rows.
therefore, I tried the below code to execute the k-means
kmeans = KMeans(n_clusters = 5, init = "k-means ", random_state = 42)
data_normalized['y_kmeans'] = kmeans.fit_predict(data_normalized)
For visualizing, I tried the below code
u_labels = np.unique(data_normalized['y_kmeans'])
#plotting the results:
for i in u_labels:
plt.scatter(data_normalized[y_kmeans == i , 0] , data_normalized[y_kmeans == i , 1] , label = i)
plt.legend()
plt.show()
I got an error as below
TypeError: '(array([False, False, False, ..., False, False, False]), 0)' is an invalid key
InvalidIndexError: (array([False, False, False, ..., False, False, False]), 0)
How can I visualize my clusters to see how far they are from each other?
CodePudding user response:
Since I do not have your dataset, I simulated your dataframe as follows: (I have assumed 9 different cluster groups)
d={'col1': [i/100 for i in random.choices(range(1,100), k=7315)],
'col2':[i/100 for i in random.choices(range(1,100), k=7315)],
'y_kmeans':random.choices(range(1,10), k=7315)}
data_normalized = pd.DataFrame(d)
After that you can plot the clusters as follows ,
import numpy as np
import random
import pandas as pd
import matplotlib.pyplot as plt
u_labels = np.unique(data_normalized['y_kmeans']).tolist()
scatter = plt.scatter(data_normalized['col1'], data_normalized['col2'],
c=data_normalized['y_kmeans'], cmap='tab20')
plt.legend(handles=scatter.legend_elements()[0], labels=u_labels)
plt.show()