Home > Blockchain >  how to set a limitation on hierarchical clustering
how to set a limitation on hierarchical clustering

Time:06-09

I have a dataset like this :

# c1 c2 c3 c4 c5
r1 3 7 4 3 5
r2 4 2 6 5 2
r3 8 4 4 6 2
r4 9 4 5 6 2
r5 3 7 4 5 8
r6 2 6 9 1 10

and the elements in each row determine the distance between locations. for example distance between r1 and c2 is 7 km.

now my question is: how can I set a limitation that prevents clustering for elements that their values are bigger than 5 ?! in other words, hierarchical algorithm Does not include them in it's calculations. please help me to solve this problem. thanks.

CodePudding user response:

Modelling using sklearn's agglomerative clustering, provide 5 in distance_threshold parameter as follows:

from sklearn.cluster import AgglomerativeClustering
cluster = AgglomerativeClustering(affinity='euclidean', linkage='ward',distance_threshold = 5)  
cluster.fit_predict(data_scaled)

For more information, check this blog [https://www.analyticsvidhya.com/blog/2019/05/beginners-guide-hierarchical-clustering/][1]

  • Related