Home > Software engineering >  writing function that conditionally adds something to a dataframe
writing function that conditionally adds something to a dataframe

Time:11-25

I have a list of genes with their specific expressions in one column. what I am trying to do is if the minimal expression of a gene is higher than a threshold. add the name of the gene to an empty dataframe, and use this function to loop through all genes in the beginning dataframe:

data input:

data <- data.frame(gene = c("a", "a", "a", "a", "b", "b", "b", "b", "c", "c", "c", "c"),
          expression = c("30", "25", "350", "25", "1", "50", "50", "40", "25", "26", "260", "360"))

   gene expression
1     a         30
2     a         25
3     a        350
4     a         25
5     b          1
6     b         50
7     b         50
8     b         40
9     c         25
10    c         26
11    c        260
12    c        360

here I make the empty dataframe:

test <- as.data.frame(matrix(ncol = 1, nrow = 0))

function<- function(z){
  #filter a specific gene
  data_2 <- data %>% filter(gene %in% z)
  if(min(data_2 $expression) > 25){
    test <<- cbind(test, z)
  }

}

for (i in unique(data$gene)){
  potential(i)
}

But this does not work. How to I keep adding something to a dataframe if a condition is met?

expected output:

data.frame(gene = c("a", "c"))
  gene
1    a
2    c

Thanks in advance.

CodePudding user response:

Get minimum value per gene, then filter:

x <- aggregate(expression ~ gene, data = data, min)
x[ x$expression >= 25, "gene"]
#[1] "a" "c"

Note: > 25 would give no genes, so added >= 25 instead.

  •  Tags:  
  • r
  • Related