R - convert each row of a matrix in a character vector and save as named list-CodePudding

I have a matrix containing biological pathways (rows) and corresponding genes (columns). If a gene is present in a pathway the cell contains 1, otherwise 0. See example below:

mat=matrix(c(0,0,1,0,1,1,1,1,1), nrow = 3, ncol = 3)

row.names(mat) = c("pathwayX", "pathwayY", "pathwayZ")

colnames(mat) = c("Gene1", "Gene2", "Gene3")

	Gene1	Gene2	Gene3
pathwayX	0	0	1
pathwayY	0	1	1
pathwayZ	1	1	1

What I need is a character vector for each pathway with constituting genes, holded in a list (e. g named gene_sets). In this example this would be:

> gene_sets
$pathwayX
"Gene3"

$pathwayY
"Gene2" "Gene3"

$pathwayZ
"Gene1" "Gene2" "Gene3"

Additionally, I need character vectors describing the pathway name, holded in a list (e. g. named description). In this example this would be:

> description
$pathwayX
"pathwayX"

$pathwayY
"pathwayY"

$pathwayZ
"pathwayZ"

Background: The vector lists are needed for the package pathfindR with costum input (https://github.com/egeulgen/pathfindR/wiki/Analysis-Using-Custom-Gene-Sets).

CodePudding user response：

Well done giving us a reproducible example. You can use the apply family of functions where lapply gives you a list as output, sapply will try to simplify the result, and apply lets you decide if you want to apply the function over rows or columns of a data.frame using the margin argument. (default is columns if used with lapply or sapply).

mat <- as.data.frame(mat)
gene_sets <- apply(mat, 1, function(x) colnames(mat)[x==1])
description <- lapply(row.names(mat), function(x) x)
names(description) <- row.names(mat)
> gene_sets
$pathwayX
[1] "Gene3"

$pathwayY
[1] "Gene2" "Gene3"

$pathwayZ
[1] "Gene1" "Gene2" "Gene3"

> description
$pathwayX
[1] "pathwayX"

$pathwayY
[1] "pathwayY"

$pathwayZ
[1] "pathwayZ"

Not sure I follow your logic regarding the description list, but this seems to give your expected result.

CodePudding user response：

Is it also possible to reverse the listing into a binary matrix?

Sticking to the initial example, to go from

gene_sets = list(pathwayX= c("gene3"),pathwayY= c("gene2", "gene3"),pathwayz= c("gene1", "gene2","gene3"))

> gene_sets
$pathwayX
"Gene3"

$pathwayY
"Gene2" "Gene3"

$pathwayZ
"Gene1" "Gene2" "Gene3"

to the binary matrix

	Gene1	Gene2	Gene3
pathwayX	0	0	1
pathwayY	0	1	1
pathwayZ	1	1	1