I have a dataframe that contains several columns
# Temp Data
library(dplyr)
df <- as.data.frame(matrix(seq(1:40),ncol=6,nrow=40))
colnames(df) <- c("A_logFC", "B_logFC", "C_logFC", "A_qvalue", "B_qvalue", "C_qvalue")
I would like to filter out those rows that have a qvalue lower than a threshold in all conditions (A, B, C).
I could do the obvious by filtering each column separately
df %>%
filter(A_qvalue < 0.05 & B_qvalue < 0.05 & C_qvalue < 0.05)
but the real dataframe has 15 columns with q-values.
I also tried reshaping the dataframe (as found here)
df_ID = DEGs_df %>% mutate(ID = 1:n())
df_ID %>%
select(contains("qval"), ID) %>%
gather(variable, value, -ID) %>%
filter(value < 0.05) %>%
semi_join(df_ID)
but then I cannot filter for those rows whose q-value is below the threshold in all conditions.
Conceptually, it would be something like
df %>%
filter(grep("q_value",.) < 0.05)
but this does not work either.
Any suggestions on how to solve that? Thank you in advance!
CodePudding user response:
You can filter
multiple columns at once using if_all
:
library(dplyr)
df %>%
filter(if_all(matches("_qvalue"), ~ . < 0.05))
In this case I use the filtering condition x < 0.05
on all columns whose name matches _qvalue
.
Your second approach can also work if you group by ID
first and then use all
inside filter:
df_ID = df %>% mutate(ID = 1:n())
df_ID %>%
select(contains("qval"), ID) %>%
gather(variable, value, -ID) %>%
group_by(ID) %>%
filter(all(value < 0.05)) %>%
semi_join(df_ID, by = "ID")