I know that before any clustering we need to scale the data.
But I want to ask if the KElbowVisualizer method do the scaling by itself or before giving it the data I should scale it.
I already searched in the documentation of this method but I did not find an answer please can you share it with me if you find it. Thank you;
CodePudding user response:
I looked at the implementation of KElbowVisualizer
inyellowbrick/cluster/elbow.py
at github and I havn't found any code under function fit
(line 306
) for scaling the X
variables.
# https://github.com/DistrictDataLabs/yellowbrick/blob/main/yellowbrick/cluster/elbow.py
#...
def fit(self, X, y=None, **kwargs):
"""
Fits n KMeans models where n is the length of ``self.k_values_``,
storing the silhouette scores in the ``self.k_scores_`` attribute.
The "elbow" and silhouette score corresponding to it are stored in
``self.elbow_value`` and ``self.elbow_score`` respectively.
This method finishes up by calling draw to create the plot.
"""
self.k_scores_ = []
self.k_timers_ = []
self.kneedle = None
self.knee_value = None
if self.locate_elbow:
self.elbow_value_ = None
self.elbow_score_ = None
for k in self.k_values_:
# Compute the start time for each model
start = time.time()
# Set the k value and fit the model
self.estimator.set_params(n_clusters=k)
self.estimator.fit(X, **kwargs)
# Append the time and score to our plottable metrics
self.k_timers_.append(time.time() - start)
self.k_scores_.append(self.scoring_metric(X, self.estimator.labels_))
#...
So, you may need to scale your data (X
parameters) before passing to KElbowVisualizer().fit()