I have a dataset like this :
# | c1 | c2 | c3 | c4 | c5 |
---|---|---|---|---|---|
r1 | 3 | 7 | 4 | 3 | 5 |
r2 | 4 | 2 | 6 | 5 | 2 |
r3 | 8 | 4 | 4 | 6 | 2 |
r4 | 9 | 4 | 5 | 6 | 2 |
r5 | 3 | 7 | 4 | 5 | 8 |
r6 | 2 | 6 | 9 | 1 | 10 |
and the elements in each row determine the distance between locations. for example distance between r1 and c2 is 7 km.
now my question is: how can I set a limitation that prevents clustering for elements that their values are bigger than 5 ?! in other words, hierarchical algorithm Does not include them in it's calculations. please help me to solve this problem. thanks.
CodePudding user response:
Modelling using sklearn's agglomerative clustering, provide 5
in distance_threshold
parameter as follows:
from sklearn.cluster import AgglomerativeClustering
cluster = AgglomerativeClustering(affinity='euclidean', linkage='ward',distance_threshold = 5)
cluster.fit_predict(data_scaled)
For more information, check this blog [https://www.analyticsvidhya.com/blog/2019/05/beginners-guide-hierarchical-clustering/][1]