I have one big data frame with different columns like name, position, expression level, q value and so on, and i have many repeats for most of the objects with same name but different expression levels, so I want to filter them if expression levels are in opposite of each other for example up( ) and down (-) regulated values, omit and remove those, but if it finds repeats with different expressions but all up ( ) or all down (-) regulated, keep them. here is an example of my file:
df1<-data.frame(gene.name=c( "DEC1","DEC1","DEC1","ATP","ANXA2","ANXA1","ANXA1","ANXA1"),
expression.level=c(2.01,0.5,-1.56,3.1,0.67,0.1,1.2,3),
q.value=c(0.001,0.002,0.0001,0.9,0.00001,0.9,0.0002,0.002))
and output like this:
output<-data.frame(gene.name=c( "ATP","ANXA2","ANXA1","ANXA1","ANXA1"),
expression.level=c(3.1,0.67,0.1,1.2,3),
q.value=c(0.9,0.00001,0.9,0.0002,0.002))
Thanks in advance for your help.
CodePudding user response:
We can use sign()
to check whether they are positive or negative or zero. Then use filter
to include those that have the same sign.
library(dplyr)
df1 %>%
group_by(gene.name) %>%
filter(length(unique(sign(expression.level))) == 1) %>%
ungroup()
gene.name expression.level q.value
1 ATP 3.10 9e-01
2 ANXA2 0.67 1e-05
3 ANXA1 0.10 9e-01
4 ANXA1 1.20 2e-04
5 ANXA1 3.00 2e-03
CodePudding user response:
Using ave
you can do this with a one-liner.
df1[with(df1, ave(expression.level, gene.name, FUN=\(x) length(unique(sign(x))))) == 1, ]
# gene.name expression.level q.value
# 4 ATP 3.10 9e-01
# 5 ANXA2 0.67 1e-05
# 6 ANXA1 0.10 9e-01
# 7 ANXA1 1.20 2e-04
# 8 ANXA1 3.00 2e-03
CodePudding user response:
Using data.table
library(data.table)
setDT(df1)[df1[, .I[uniqueN(sign(expression.level)) == 1], gene.name]$V1]
-output
gene.name expression.level q.value
<char> <num> <num>
1: ATP 3.10 9e-01
2: ANXA2 0.67 1e-05
3: ANXA1 0.10 9e-01
4: ANXA1 1.20 2e-04
5: ANXA1 3.00 2e-03