I have a list of genes with their specific expressions in one column. what I am trying to do is if the minimal expression of a gene is higher than a threshold. add the name of the gene to an empty dataframe, and use this function to loop through all genes in the beginning dataframe:
data input:
data <- data.frame(gene = c("a", "a", "a", "a", "b", "b", "b", "b", "c", "c", "c", "c"),
expression = c("30", "25", "350", "25", "1", "50", "50", "40", "25", "26", "260", "360"))
gene expression
1 a 30
2 a 25
3 a 350
4 a 25
5 b 1
6 b 50
7 b 50
8 b 40
9 c 25
10 c 26
11 c 260
12 c 360
here I make the empty dataframe:
test <- as.data.frame(matrix(ncol = 1, nrow = 0))
function<- function(z){
#filter a specific gene
data_2 <- data %>% filter(gene %in% z)
if(min(data_2 $expression) > 25){
test <<- cbind(test, z)
}
}
for (i in unique(data$gene)){
potential(i)
}
But this does not work. How to I keep adding something to a dataframe if a condition is met?
expected output:
data.frame(gene = c("a", "c"))
gene
1 a
2 c
Thanks in advance.
CodePudding user response:
Get minimum value per gene, then filter:
x <- aggregate(expression ~ gene, data = data, min)
x[ x$expression >= 25, "gene"]
#[1] "a" "c"
Note: > 25
would give no genes, so added >= 25
instead.