Home > Net >  Should we scale before the KElbowVisualizer method for clustering in python
Should we scale before the KElbowVisualizer method for clustering in python

Time:05-30

I know that before any clustering we need to scale the data.

But I want to ask if the KElbowVisualizer method do the scaling by itself or before giving it the data I should scale it.

I already searched in the documentation of this method but I did not find an answer please can you share it with me if you find it. Thank you;

CodePudding user response:

I looked at the implementation of KElbowVisualizer inyellowbrick/cluster/elbow.py at github and I havn't found any code under function fit (line 306) for scaling the X variables.

# https://github.com/DistrictDataLabs/yellowbrick/blob/main/yellowbrick/cluster/elbow.py
#...
 def fit(self, X, y=None, **kwargs):
        """
        Fits n KMeans models where n is the length of ``self.k_values_``,
        storing the silhouette scores in the ``self.k_scores_`` attribute.
        The "elbow" and silhouette score corresponding to it are stored in
        ``self.elbow_value`` and ``self.elbow_score`` respectively.
        This method finishes up by calling draw to create the plot.
        """

        self.k_scores_ = []
        self.k_timers_ = []
        self.kneedle = None
        self.knee_value = None

        if self.locate_elbow:
            self.elbow_value_ = None
            self.elbow_score_ = None

        for k in self.k_values_:
            # Compute the start time for each  model
            start = time.time()

            # Set the k value and fit the model
            self.estimator.set_params(n_clusters=k)
            self.estimator.fit(X, **kwargs)

            # Append the time and score to our plottable metrics
            self.k_timers_.append(time.time() - start)
            self.k_scores_.append(self.scoring_metric(X, self.estimator.labels_))
#...

So, you may need to scale your data (X parameters) before passing to KElbowVisualizer().fit()

  • Related