I have a dataframe like this:
pathways | genes | |
---|---|---|
1 | REACTOME_2_LTR_CIRCLE_FORMATION | ENSG00000175334 |
2 | REACTOME_A_TETRASACCHARIDE_LINKER_SEQUENCE_IS_REQUIRED_FOR_GAG_SYNTHESIS | ENSG00000109956 |
3 | REACTOME_ABC_FAMILY_PROTEINS_MEDIATED_TRANSPORT | ENSG00000072849 |
5 | REACTOME_CELL_CYCLE | ENSG00000196230 |
12 | REACTOME_CELL_CYCLE | ENSG00000101162 |
13 | REACTOME_CELL_CYCLE | ENSG00000137267 |
I would like to create a vector c()
of all the pathways for a sigle gene.
I tried with group_by()
in dplyr but it is not working.
sub_pathway=sub_path%>%
group_by(genes)%>%
summarise(n())
It gives me just the count. If i do only summarise()
, it gives me only the gene column list.
I also try a loop but it is turning until yesterday.
structure(list(pathways = c("REACTOME_2_LTR_CIRCLE_FORMATION", "REACTOME_A_TETRASACCHARIDE_LINKER_SEQUENCE_IS_REQUIRED_FOR_GAG_SYNTHESIS", "REACTOME_ABC_FAMILY_PROTEINS_MEDIATED_TRANSPORT", "REACTOME_CELL_CYCLE", "REACTOME_CELL_CYCLE", "REACTOME_CELL_CYCLE"), genes = c("ENSG00000175334", "ENSG00000109956", "ENSG00000072849", "ENSG00000196230", "ENSG00000101162", "ENSG00000137267")), row.names = c(1L, 2L, 3L, 5L, 12L, 13L), class = "data.frame")
CodePudding user response:
Since your data is in a data.frame, you cannot put all pathways for a gene into a single vector. In fact, each column in the table is a vector, and vectors in R are flat: they cannot be nested.
However, you can use list columns: lists are R’s way of nesting structures. Therefore, the following works:
sub_pathway = sub_path %>%
group_by(genes) %>%
summarise(pathways = list(pathways))
The result is a table with one row per gene, and the pathways
column is a list with, for each row, one vector of pathways.
Unfortunately, R doesn’t make it very easy to work with list columns, so the resulting data might not be very easy to work with. For example, if you want to output the data it might be more convenient to merge the pathways into a character per gene:
sub_path %>%
group_by(genes) %>%
summarise(pathways = paste(pathways, collapse = ', '))
# genes pathways
# <chr> <chr>
# 1 ENSG00000000419 REACTOME_DISEASES_ASSOCIATED_WITH_GLYCOSYLATION_PRECURSOR_BIOSYNTHESIS…
# 2 ENSG00000000938 REACTOME_FCGAMMA_RECEPTOR_FCGR_DEPENDENT_PHAGOCYTOSIS, REACTOME_FCGR_A…
What’s more convenient depends on what you need to do with the data afterwards.