I'm trying to customize a phylogenetic tree based on a tree file and a dataframe. The tree file has the same data in terms of ID, for example, GCA_021406745.1_ASM2140674v1 is in this file and in the data frame. Dataframe looks like this:
GCA_000375645.1_ASM37564v1 20
GCA_900543265.1_UMGS547 20
GCA_000614355.1_ASM61435v1 7
GCA_000766005.1_ASM76600v1 7
Where the second column is the cluster value. This value is important because I want to use this value to customize the labels of my phylogenetic tree, for example, "1" = red, "2" = green, and so on. To do that, I'm using a python program for phylogenetic tree manipulation: Toytree https://toytree.readthedocs.io/en/latest/index.html
Specifically, I'm using tip_labels_colors
to customize the labels. For example, with this example (https://toytree.readthedocs.io/en/latest/8-styling.html#Node-labels-styling) you can do that task by making a list of hex color values based on tip labels:
colorlist = ["#d6557c" if "rex" in tip else "#5384a3" for tip in rtre.get_tip_labels()]
rtre.draw(
tip_labels_align=True,
tip_labels_colors=colorlist
);
That if statement is based on the condition if "rex" is in the label. Now, I want to do the same based on my data frame, but using the cluster value. I'm thinking of doing the same color_list
but with a color for each cluster value.
I have not been able to do that successfully, so I need some help with maybe an idea or pseudocode.
Here is a minimal example, using data from toytree:
import toytree
import toyplot
import numpy as np
# a tree to use for examples
url = "https://eaton-lab.org/data/Cyathophora.tre"
rtre = toytree.tree(url).root(wildcard='prz')
Using these lines, you can customize the labels of the tree with two different colors.
# make list of hex color values based on tip labels
colorlist = ["#d6557c" if "rex" in tip else "#5384a3" for tip in rtre.get_tip_labels()]
rtre.draw(
tip_labels_align=True,
tip_labels_colors=colorlist
);
The example used the condition "rex" in the label to color the label with a specific color. Well, I need help with that because I need to color my labels based on my data frame values (cluster values).
CodePudding user response:
- make a dictionary mapping values to colors
colormap = {20:"#d6557c", 7:"#5384a3",...}
- iterate over
rtre.get_tip_labels()
return value :for ID in rtre.get_tip_labels():
- for each item filter the DataFrame using the ID and get the cluster value
cluster_value = df.loc[df['ID'] == ID,'cluster_value_column_name']
- Use the cluster value to get the color
color = colormap[cluster_value]
- accumulate the colors in a list.
The colors can be added to the DataFrame using Series.map
df['colors'] = df['cluster_value_column_name'].map(colormap)
The DataFrame could be sorted to the same order as rtre.get_tip_labels() and df['colors'].to_list()
could be used.
Some sorting methods...
sorting by a custom list in pandas
Sort column in Pandas DataFrame by specific order
Sorting a pandas DataFrame by the order of a list