I have a phylogenetic tree with many tips and internal nodes. I have a list of node ids from the tree. These are part of a separate table. I want to add a new column to the table, children. To get the descendants (nodes and tips), I am using phangorn::Descendants(tree, NODEID, type = 'all')
. I can add length to get the number of descendants. For example,
phangorn::Descendants(tree, 12514, type = 'all')
[1] 12515 12517 12516 5345 5346 5347 5343 5344
length(phangorn::Descendants(tree, 12514, type = 'all'))
[1] 8
I would like to very simply take the column in my dataframe 'nodes', and use the function above length(phangorn::Descendants(tree, 12514, type = 'all'))
to create a new column in the dataframe based off the input nodes.
Here is an example:
tests <- data.frame(nodes=c(12551, 12514, 12519))
length(phangorn::Descendants(tree, 12519, type = 'all'))
[1] 2
length(phangorn::Descendants(tree, 12514, type = 'all'))
[1] 8
length(phangorn::Descendants(tree, 12551, type = 'all'))
[1] 2
tests$children <- length(phangorn::Descendants(tree, tests$nodes, type = 'all'))
tests
nodes children
1 12551 3
2 12514 3
3 12519 3
As shown above, the number of children is the length of the data.frame and not the actual number of children calculated above. It should be:
tests
nodes children
1 12551 2
2 12514 8
3 12519 2
If you have any tips or idea on how I can have this behave as expected, that would be great. I have a feeling I have to use apply() or I need to index inside before using the length() function. Thank you in advance.
CodePudding user response:
You're super close! Here's one quick solution using sapply
! There are more alternatives but this one seems to follow the structure of your question!
Generating some data
library(ape)
ntips <- 10
tree <- rtree(ntips)
targetNodes <- data.frame(nodes=seq(ntips 1, ntips tree$Nnode))
Note that I'm storing all the relevant nodes in the targetNodes
object. This is equivalent to the following object in your question:
tests <- data.frame(nodes=c(12551, 12514, 12519))
Using sapply
Now, let's use sapply
to repeat the same operation across all the relevant nodes in targetNodes
:
targetNodes$children<- sapply(targetNodes$nodes, function(x){
length(phangorn::Descendants(tree, x, type = 'all'))
})
I'm saving the output of our sapply
function by creating a new column in targetNodes
.
Good luck!