I have a matrix containing biological pathways (rows) and corresponding genes (columns). If a gene is present in a pathway the cell contains 1, otherwise 0. See example below:
mat=matrix(c(0,0,1,0,1,1,1,1,1), nrow = 3, ncol = 3)
row.names(mat) = c("pathwayX", "pathwayY", "pathwayZ")
colnames(mat) = c("Gene1", "Gene2", "Gene3")
Gene1 | Gene2 | Gene3 | |
---|---|---|---|
pathwayX | 0 | 0 | 1 |
pathwayY | 0 | 1 | 1 |
pathwayZ | 1 | 1 | 1 |
What I need is a character vector for each pathway with constituting genes, holded in a list (e. g named gene_sets). In this example this would be:
> gene_sets
$pathwayX
"Gene3"
$pathwayY
"Gene2" "Gene3"
$pathwayZ
"Gene1" "Gene2" "Gene3"
Additionally, I need character vectors describing the pathway name, holded in a list (e. g. named description). In this example this would be:
> description
$pathwayX
"pathwayX"
$pathwayY
"pathwayY"
$pathwayZ
"pathwayZ"
Background: The vector lists are needed for the package pathfindR with costum input (https://github.com/egeulgen/pathfindR/wiki/Analysis-Using-Custom-Gene-Sets).
CodePudding user response:
Well done giving us a reproducible example. You can use the apply
family of functions where lapply
gives you a list as output, sapply
will try to simplify the result, and apply
lets you decide if you want to apply the function over rows or columns of a data.frame
using the margin
argument.
(default is columns if used with lapply
or sapply
).
mat <- as.data.frame(mat)
gene_sets <- apply(mat, 1, function(x) colnames(mat)[x==1])
description <- lapply(row.names(mat), function(x) x)
names(description) <- row.names(mat)
> gene_sets
$pathwayX
[1] "Gene3"
$pathwayY
[1] "Gene2" "Gene3"
$pathwayZ
[1] "Gene1" "Gene2" "Gene3"
> description
$pathwayX
[1] "pathwayX"
$pathwayY
[1] "pathwayY"
$pathwayZ
[1] "pathwayZ"
Not sure I follow your logic regarding the description
list, but this seems to give your expected result.
CodePudding user response:
Is it also possible to reverse the listing into a binary matrix?
Sticking to the initial example, to go from
gene_sets = list(pathwayX= c("gene3"),pathwayY= c("gene2", "gene3"),pathwayz= c("gene1", "gene2","gene3"))
> gene_sets
$pathwayX
"Gene3"
$pathwayY
"Gene2" "Gene3"
$pathwayZ
"Gene1" "Gene2" "Gene3"
to the binary matrix
Gene1 | Gene2 | Gene3 | |
---|---|---|---|
pathwayX | 0 | 0 | 1 |
pathwayY | 0 | 1 | 1 |
pathwayZ | 1 | 1 | 1 |