How can I choose the cluster with the highest WCSS value via sumd and idx?-CodePudding

I am applying the bisecting k-means algorithm to cluster users for each antenna beam.

The problem arises after splitting the cluster containing all users in two. In fact at this point I have to go to select the cluster with the highest wcss but I don't know how to do it.
I had thought about taking advantage of the sumd and idx values.

function [Clustering, SYSTEM] = CLUST_Bkmeans(kk, SYSTEM, USERS, ChannelMatrix)
    Clustering = cell(SYSTEM.Nbeams,1);
    UserPool = (1:SYSTEM.Nusers)';
    
    Channel_real = real(ChannelMatrix);
    Channel_imag = imag(ChannelMatrix);
    
    avg_clusterSize = kk;
    
    for ii=1:SYSTEM.Nbeams              
        Users          = UserPool(USERS.BeamIndex==ii);

        %Matrix of channel coefficient built as [real part | imaginary part]
        Users_real = Channel_real(Users,:);
        Users_imag = Channel_imag(Users,:);
        
        X = [Users_real Users_imag];
        
        SYSTEM.Nclusters(ii) = ceil(size(Users,1)/avg_clusterSize);
        
        Clustering{ii} = cell(SYSTEM.Nclusters(ii),1);
        
        %Bisecting k-means clustering of X 
        
        [idx,C,sumd] = kmeans(X,2); %first division in two cluster
    
        for pp = 3:SYSTEM.Ncluster(ii)                             
            %kmeans applied to cluster with higher WCSS
        end
        
        % silhouette(X,idx)
        % xlabel('Silhouette Value')
        % ylabel('Cluster')
    
        for jj = 1:SYSTEM.Nclusters(ii)
            Clustering{ii}{jj,1} = Users(idx==jj)';
        end    
    end
end

CodePudding user response：

As you've noted, kmeans returns the cluster indices and the sum of squared distances within each cluster (along with the centroid of each cluster, but we don't need that in this instance).

Finding the cluster with the highest WCSS is easy. sumd is a k x 1 vector where k is the number of clusters. With just two clusters, you can easily select which one is larger, but if you have more clusters, you can use the I (index) return value from max:

[~, max_wcss_cluster] = max(sumd);   % index is the second return value

At some point, you're probably going to need to know which observations in X are in a particular cluster. To list those rows of X, you would use the idx vector returned by kmeans and logical indexing:

cluster_number = 2;   % find all observations in cluster 2
my_cluster = X(idx==cluster_number, :);