I have a distance matrix:
array('d', [188.61516889752, 226.68716730362135, 188.96015266132167])
I would like to add labels to the matrix before performing hierarchical cluster using scipy.
I produce a UPGMA dendrogram from the distance matrix using:
from scipy.cluster.hierarchy import average, fcluster
#from scipy.spatial.distance import pdist
outDND=average(distanceMatrix)
I have tried adding the labels to the dendrogram using:
from scipy.cluster.hierarchy import average, fcluster
#from scipy.spatial.distance import pdist
outDND=average(distanceMatrix, labels=['A','B','C'])
But that does not work. I get the error:
TypeError: average() got an unexpected keyword argument 'labels'
How can I add labels to 'distanceMatrix' and have them carry through to outDND?
CodePudding user response:
It looks like you're missing a couple steps between "create the distance matrix" and "create the dendrogram".
See this other StackOverflow question for several worked examples.
In general, scipy
and the underlying numpy
tend not to include labels in their data structures. (Unlike, say pandas
, which does track labels.). That means you're responsible for keeping separate lists of labels and figuring out the correct order & references.
The steps you'll need are:
- Compute the distance matrix (which you've done, although you should drop the unrecognized "labels" parameter)
- Use the scipy.cluster.hierarchy.linkage() function to find hierarchies using the just-computed distance matrix.
- Display the resulting linkages using scipy.cluster.hierarchy.dendrogram(). This is the step at which you'll be able to insert your labels using the "labels" argument.