I've got this following code which extract 2 feature(tempo & slotID) from csv file and plot kmeans clustering based on this 2 features.
df = pd.read_csv("prova.csv", encoding = "ISO-8859-1", sep = ';')
dfSlotMean = df.groupby('slotID', as_index=False)['tempo'].mean()
df = dfSlotMean[['tempo','slotID']]
###############################################
Sum_of_squared_distances = []
K = range(1,15)
for k in K:
km = KMeans(n_clusters=k)
km = km.fit(df)
Sum_of_squared_distances.append(km.inertia_)
plt.plot(K, Sum_of_squared_distances, 'bx-')
plt.xlabel('k')
plt.ylabel('Sum_of_squared_distances')
plt.title('Elbow Method For Optimal k')
plt.show()
################################################
kmeans = KMeans(n_clusters=5).fit(df)
labels = kmeans.labels_
centroids = kmeans.cluster_centers_
print(centroids)
# plt.scatter(df['tempo'], df['slotID'], c= kmeans.labels_.astype(float), s=25, alpha=0.5)
# plt.scatter(centroids[:, 0], centroids[:, 1], c='red', s=50)
# plt.show()
plt.scatter(df['slotID'], df['tempo'], c= kmeans.labels_.astype(float), s=25, alpha=0.5)
plt.title('Martedì')
plt.scatter(centroids[:, 1], centroids[:, 0], c='red', s=25)
plt.show()
print(pd.Series(labels).value_counts())
What i want to do now is get the value assign in each cluster. How can i do that? This is the output of code:
In short, i want, for example, the points that belong to cluster number 1 are: 131,98; 135,76 ecc...
CodePudding user response:
Use the dataframe indexing to get the desired data. For example, if you want points from cluster 1, you can get them with
df[labels == 1]
and if you want to get them all:
for i in np.unique(labels):
print(df[labels == i])