Home > OS >  Using a function on a column from tree file class Phylo
Using a function on a column from tree file class Phylo

Time:06-28

I have a phylogenetic tree with many tips and internal nodes. I have a list of node ids from the tree. These are part of a separate table. I want to add a new column to the table, children. To get the descendants (nodes and tips), I am using phangorn::Descendants(tree, NODEID, type = 'all'). I can add length to get the number of descendants. For example,

phangorn::Descendants(tree, 12514, type = 'all')
[1] 12515 12517 12516  5345  5346  5347  5343  5344

length(phangorn::Descendants(tree, 12514, type = 'all'))
[1] 8

I would like to very simply take the column in my dataframe 'nodes', and use the function above length(phangorn::Descendants(tree, 12514, type = 'all')) to create a new column in the dataframe based off the input nodes.

Here is an example:

tests <- data.frame(nodes=c(12551, 12514, 12519))
length(phangorn::Descendants(tree, 12519, type = 'all'))
[1] 2
length(phangorn::Descendants(tree, 12514, type = 'all'))
[1] 8
length(phangorn::Descendants(tree, 12551, type = 'all'))
[1] 2
tests$children <- length(phangorn::Descendants(tree, tests$nodes, type = 'all'))
tests
  nodes children
1 12551        3
2 12514        3
3 12519        3

As shown above, the number of children is the length of the data.frame and not the actual number of children calculated above. It should be:

tests
  nodes children
1 12551        2
2 12514        8
3 12519        2

If you have any tips or idea on how I can have this behave as expected, that would be great. I have a feeling I have to use apply() or I need to index inside before using the length() function. Thank you in advance.

CodePudding user response:

You're super close! Here's one quick solution using sapply! There are more alternatives but this one seems to follow the structure of your question!

Generating some data

library(ape)

ntips <- 10
tree <- rtree(ntips)
targetNodes <- data.frame(nodes=seq(ntips 1, ntips tree$Nnode))

Note that I'm storing all the relevant nodes in the targetNodes object. This is equivalent to the following object in your question:

tests <- data.frame(nodes=c(12551, 12514, 12519))

Using sapply

Now, let's use sapply to repeat the same operation across all the relevant nodes in targetNodes:

targetNodes$children<- sapply(targetNodes$nodes, function(x){
  length(phangorn::Descendants(tree, x, type = 'all'))
})

I'm saving the output of our sapply function by creating a new column in targetNodes.

Good luck!

  • Related