In Kmeans clustering we can define number of cluster. But is it possible to define that cluster_1 will contain 20% data, cluster_2 will have 30% and cluster_3 will have rest of the data points?
I try to do it by python but couldn't.
CodePudding user response:
Here is a discussion on how to modify KMeans so that the clusters all have the same size. You could modify it further to make the clusters have your desired respective sizes.
CodePudding user response:
Using K-means clustering, as you said we specify the number of clusters but it's not actually possible to specify the percentage of data points. I would recommend using Fuzzy-C if you want to specify a exact percentage of data points alloted for each cluster