Home > Blockchain >  Using dataframe values to coloring labels with python
Using dataframe values to coloring labels with python

Time:06-25

I'm trying to customize a phylogenetic tree based on a tree file and a dataframe. The tree file has the same data in terms of ID, for example, GCA_021406745.1_ASM2140674v1 is in this file and in the data frame. Dataframe looks like this:

GCA_000375645.1_ASM37564v1  20
GCA_900543265.1_UMGS547 20
GCA_000614355.1_ASM61435v1  7
GCA_000766005.1_ASM76600v1  7

Where the second column is the cluster value. This value is important because I want to use this value to customize the labels of my phylogenetic tree, for example, "1" = red, "2" = green, and so on. To do that, I'm using a python program for phylogenetic tree manipulation: Toytree https://toytree.readthedocs.io/en/latest/index.html

Specifically, I'm using tip_labels_colors to customize the labels. For example, with this example (https://toytree.readthedocs.io/en/latest/8-styling.html#Node-labels-styling) you can do that task by making a list of hex color values based on tip labels:

colorlist = ["#d6557c" if "rex" in tip else "#5384a3" for tip in rtre.get_tip_labels()]
rtre.draw(
    tip_labels_align=True,
    tip_labels_colors=colorlist
);

That if statement is based on the condition if "rex" is in the label. Now, I want to do the same based on my data frame, but using the cluster value. I'm thinking of doing the same color_list but with a color for each cluster value. I have not been able to do that successfully, so I need some help with maybe an idea or pseudocode. Here is a minimal example, using data from toytree:

import toytree
import toyplot
import numpy as np

# a tree to use for examples
url = "https://eaton-lab.org/data/Cyathophora.tre"
rtre = toytree.tree(url).root(wildcard='prz')

Using these lines, you can customize the labels of the tree with two different colors.

# make list of hex color values based on tip labels
colorlist = ["#d6557c" if "rex" in tip else "#5384a3" for tip in rtre.get_tip_labels()]
rtre.draw(
    tip_labels_align=True,
    tip_labels_colors=colorlist
);

The example used the condition "rex" in the label to color the label with a specific color. Well, I need help with that because I need to color my labels based on my data frame values (cluster values).

CodePudding user response:

  • make a dictionary mapping values to colors
     colormap = {20:"#d6557c", 7:"#5384a3",...}
  • iterate over rtre.get_tip_labels() return value : for ID in rtre.get_tip_labels():
  • for each item filter the DataFrame using the ID and get the cluster value
    cluster_value = df.loc[df['ID'] == ID,'cluster_value_column_name']
  • Use the cluster value to get the color
    color = colormap[cluster_value]
  • accumulate the colors in a list.

The colors can be added to the DataFrame using Series.map

df['colors'] = df['cluster_value_column_name'].map(colormap)

The DataFrame could be sorted to the same order as rtre.get_tip_labels() and df['colors'].to_list() could be used.

Some sorting methods...
sorting by a custom list in pandas
Sort column in Pandas DataFrame by specific order
Sorting a pandas DataFrame by the order of a list

  • Related