I am applying the bisecting k-means algorithm to cluster users for each antenna beam.
The problem arises after splitting the cluster containing all users in two. In fact at this point I have to go to select the cluster with the highest wcss
but I don't know how to do it.
I had thought about taking advantage of the sumd
and idx
values.
function [Clustering, SYSTEM] = CLUST_Bkmeans(kk, SYSTEM, USERS, ChannelMatrix)
Clustering = cell(SYSTEM.Nbeams,1);
UserPool = (1:SYSTEM.Nusers)';
Channel_real = real(ChannelMatrix);
Channel_imag = imag(ChannelMatrix);
avg_clusterSize = kk;
for ii=1:SYSTEM.Nbeams
Users = UserPool(USERS.BeamIndex==ii);
%Matrix of channel coefficient built as [real part | imaginary part]
Users_real = Channel_real(Users,:);
Users_imag = Channel_imag(Users,:);
X = [Users_real Users_imag];
SYSTEM.Nclusters(ii) = ceil(size(Users,1)/avg_clusterSize);
Clustering{ii} = cell(SYSTEM.Nclusters(ii),1);
%Bisecting k-means clustering of X
[idx,C,sumd] = kmeans(X,2); %first division in two cluster
for pp = 3:SYSTEM.Ncluster(ii)
%kmeans applied to cluster with higher WCSS
end
% silhouette(X,idx)
% xlabel('Silhouette Value')
% ylabel('Cluster')
for jj = 1:SYSTEM.Nclusters(ii)
Clustering{ii}{jj,1} = Users(idx==jj)';
end
end
end
CodePudding user response:
As you've noted, kmeans
returns the cluster indices and the sum of squared distances within each cluster (along with the centroid of each cluster, but we don't need that in this instance).
Finding the cluster with the highest WCSS is easy. sumd
is a k x 1 vector where k is the number of clusters. With just two clusters, you can easily select which one is larger, but if you have more clusters, you can use the I
(index) return value from max
:
[~, max_wcss_cluster] = max(sumd); % index is the second return value
At some point, you're probably going to need to know which observations in X
are in a particular cluster. To list those rows of X
, you would use the idx
vector returned by kmeans
and logical indexing:
cluster_number = 2; % find all observations in cluster 2
my_cluster = X(idx==cluster_number, :);