Home > other >  How to visually compare clusters using python?
How to visually compare clusters using python?

Time:05-29

I am working on k-means clustering for customer segmentation. My input data has 12 features and 7315 rows.

therefore, I tried the below code to execute the k-means

kmeans = KMeans(n_clusters = 5, init = "k-means  ", random_state = 42)
data_normalized['y_kmeans'] = kmeans.fit_predict(data_normalized)

For visualizing, I tried the below code

u_labels = np.unique(data_normalized['y_kmeans'])
 
#plotting the results:
 
for i in u_labels:
    plt.scatter(data_normalized[y_kmeans == i , 0] , data_normalized[y_kmeans == i , 1] , label = i)
plt.legend()
plt.show()

I got an error as below

TypeError: '(array([False, False, False, ..., False, False, False]), 0)' is an invalid key

InvalidIndexError: (array([False, False, False, ..., False, False, False]), 0)

How can I visualize my clusters to see how far they are from each other?

CodePudding user response:

Since I do not have your dataset, I simulated your dataframe as follows: (I have assumed 9 different cluster groups)

d={'col1': [i/100 for i in random.choices(range(1,100), k=7315)],
       'col2':[i/100 for i in random.choices(range(1,100), k=7315)],
       'y_kmeans':random.choices(range(1,10), k=7315)}
data_normalized = pd.DataFrame(d)

After that you can plot the clusters as follows ,

import numpy as np
import random
import pandas as pd
import matplotlib.pyplot as plt

u_labels = np.unique(data_normalized['y_kmeans']).tolist()

scatter = plt.scatter(data_normalized['col1'], data_normalized['col2'],
            c=data_normalized['y_kmeans'], cmap='tab20')
plt.legend(handles=scatter.legend_elements()[0], labels=u_labels)
plt.show()

I get the following clusters plotclusters

  • Related