Home > front end >  How can I add labels to a distance matrix used to make a dendrogram and have the labels also show on
How can I add labels to a distance matrix used to make a dendrogram and have the labels also show on

Time:11-29

I have a distance matrix:

array('d', [188.61516889752, 226.68716730362135, 188.96015266132167])

I would like to add labels to the matrix before performing hierarchical cluster using scipy.

I produce a UPGMA dendrogram from the distance matrix using:

from scipy.cluster.hierarchy import average, fcluster
#from scipy.spatial.distance import pdist

outDND=average(distanceMatrix)

I have tried adding the labels to the dendrogram using:

from scipy.cluster.hierarchy import average, fcluster
#from scipy.spatial.distance import pdist

outDND=average(distanceMatrix, labels=['A','B','C'])

But that does not work. I get the error:

TypeError: average() got an unexpected keyword argument 'labels'

How can I add labels to 'distanceMatrix' and have them carry through to outDND?

CodePudding user response:

It looks like you're missing a couple steps between "create the distance matrix" and "create the dendrogram".

See this other StackOverflow question for several worked examples.

In general, scipy and the underlying numpy tend not to include labels in their data structures. (Unlike, say pandas, which does track labels.). That means you're responsible for keeping separate lists of labels and figuring out the correct order & references.

The steps you'll need are:

  • Related