Home > OS >  Is there a way to select certain rows in a dataset?
Is there a way to select certain rows in a dataset?

Time:05-02

I have a dataset containing a set of gene names and the expression of those genes in different cell types:

Gene name Cell type Expression
Gene X Cell A 10
Gene X Cell B 20
Gene X Cell C 25
Gene X Cell D 5
Gene Y Cell A 7
Gene Y Cell B 12
Gene Y Cell C 16
Gene Y Cell D 18
Gene Z Cell A 15
Gene Z Cell B 12
Gene Z Cell C 16
Gene Z Cell D 2

I only want to identify genes in which the expression in a certain cell type (e.g. Cell A) is greater than the expression of that gene in another certain cell type (e.g. Cell B). For instance, in this dataset I provided, this would be Gene Z. I have tried group_by and filter functions in R but I don't know how to compare specific expression values between different cell types within a gene when I group the data by gene. I would really appreciate it if you could tell me how to handle this issue or if there is another function in R to do something like this. Thank you!

CodePudding user response:

If you want to do it in tidyverse, you can first transform the data to a "wide" format, and use filter to compare the expression of genes. Finally, pull the column out and you'll get a vector.

library(tidyverse)

df %>% 
  pivot_wider(names_from = "Cell.type", values_from = "Expression") %>% 
  filter(`Cell A` > `Cell B`) %>% 
  pull(Gene.name)

[1] "Gene Z"

CodePudding user response:

One way is to pivot_wider your table so that every celltype gets its own column. df being your dataframe:

library(tidyr)
df_wide <-
    df %>%
        pivot_wider(names_from = `Cell type`,
                    values_from = Expression
    )

Now you can filter:

df_wide %>%
    filter(`Cell A` > `Cell Z`)
  •  Tags:  
  • r
  • Related