I have a dataset containing a set of gene names and the expression of those genes in different cell types:
Gene name | Cell type | Expression |
---|---|---|
Gene X | Cell A | 10 |
Gene X | Cell B | 20 |
Gene X | Cell C | 25 |
Gene X | Cell D | 5 |
Gene Y | Cell A | 7 |
Gene Y | Cell B | 12 |
Gene Y | Cell C | 16 |
Gene Y | Cell D | 18 |
Gene Z | Cell A | 15 |
Gene Z | Cell B | 12 |
Gene Z | Cell C | 16 |
Gene Z | Cell D | 2 |
I only want to identify genes in which the expression in a certain cell type (e.g. Cell A) is greater than the expression of that gene in another certain cell type (e.g. Cell B). For instance, in this dataset I provided, this would be Gene Z. I have tried group_by and filter functions in R but I don't know how to compare specific expression values between different cell types within a gene when I group the data by gene. I would really appreciate it if you could tell me how to handle this issue or if there is another function in R to do something like this. Thank you!
CodePudding user response:
If you want to do it in tidyverse
, you can first transform the data to a "wide" format, and use filter
to compare the expression of genes. Finally, pull
the column out and you'll get a vector.
library(tidyverse)
df %>%
pivot_wider(names_from = "Cell.type", values_from = "Expression") %>%
filter(`Cell A` > `Cell B`) %>%
pull(Gene.name)
[1] "Gene Z"
CodePudding user response:
One way is to pivot_wider
your table so that every celltype gets its own column. df being your dataframe:
library(tidyr)
df_wide <-
df %>%
pivot_wider(names_from = `Cell type`,
values_from = Expression
)
Now you can filter:
df_wide %>%
filter(`Cell A` > `Cell Z`)