Get element in each cluster-CodePudding

I've got this following code which extract 2 feature(tempo & slotID) from csv file and plot kmeans clustering based on this 2 features.

df = pd.read_csv("prova.csv", encoding = "ISO-8859-1", sep = ';')
dfSlotMean = df.groupby('slotID', as_index=False)['tempo'].mean()

df = dfSlotMean[['tempo','slotID']]

###############################################
Sum_of_squared_distances = []

K = range(1,15)
for k in K:
    km = KMeans(n_clusters=k)
    km = km.fit(df)
    Sum_of_squared_distances.append(km.inertia_)


plt.plot(K, Sum_of_squared_distances, 'bx-')
plt.xlabel('k')
plt.ylabel('Sum_of_squared_distances')
plt.title('Elbow Method For Optimal k')
plt.show()
################################################

kmeans = KMeans(n_clusters=5).fit(df)
labels = kmeans.labels_
centroids = kmeans.cluster_centers_
print(centroids)

# plt.scatter(df['tempo'], df['slotID'], c= kmeans.labels_.astype(float), s=25, alpha=0.5)
# plt.scatter(centroids[:, 0], centroids[:, 1], c='red', s=50)
# plt.show()

plt.scatter(df['slotID'], df['tempo'], c= kmeans.labels_.astype(float), s=25, alpha=0.5)
plt.title('Martedì')
plt.scatter(centroids[:, 1], centroids[:, 0], c='red', s=25)
plt.show()
print(pd.Series(labels).value_counts())

What i want to do now is get the value assign in each cluster. How can i do that? This is the output of code:

In short, i want, for example, the points that belong to cluster number 1 are: 131,98; 135,76 ecc...

CodePudding user response：

Use the dataframe indexing to get the desired data. For example, if you want points from cluster 1, you can get them with

df[labels == 1]

and if you want to get them all:

for i in np.unique(labels):
    print(df[labels == i])