Is there a way to select certain rows in a dataset?-CodePudding

I have a dataset containing a set of gene names and the expression of those genes in different cell types:

Gene name	Cell type	Expression
Gene X	Cell A	10
Gene X	Cell B	20
Gene X	Cell C	25
Gene X	Cell D	5
Gene Y	Cell A	7
Gene Y	Cell B	12
Gene Y	Cell C	16
Gene Y	Cell D	18
Gene Z	Cell A	15
Gene Z	Cell B	12
Gene Z	Cell C	16
Gene Z	Cell D	2

I only want to identify genes in which the expression in a certain cell type (e.g. Cell A) is greater than the expression of that gene in another certain cell type (e.g. Cell B). For instance, in this dataset I provided, this would be Gene Z. I have tried group_by and filter functions in R but I don't know how to compare specific expression values between different cell types within a gene when I group the data by gene. I would really appreciate it if you could tell me how to handle this issue or if there is another function in R to do something like this. Thank you!

CodePudding user response：

If you want to do it in tidyverse, you can first transform the data to a "wide" format, and use filter to compare the expression of genes. Finally, pull the column out and you'll get a vector.

library(tidyverse)

df %>% 
  pivot_wider(names_from = "Cell.type", values_from = "Expression") %>% 
  filter(`Cell A` > `Cell B`) %>% 
  pull(Gene.name)

[1] "Gene Z"

CodePudding user response：

One way is to pivot_wider your table so that every celltype gets its own column. df being your dataframe:

library(tidyr)
df_wide <-
    df %>%
        pivot_wider(names_from = `Cell type`,
                    values_from = Expression
    )

Now you can filter:

df_wide %>%
    filter(`Cell A` > `Cell Z`)