I have a data frame containing 92 variables and 1900 observations. Essentially I have species ID as my variable of interest and relative abundance as the other variables.
Columns 69:92 (23 columns) are quality control variables that range from 0-10. I'd like to filter my data so that each species has at least 4 quality control variables that are >5. The best thing I've come up with until now is not what I want but at least filters the data on the basis that I have at least 1 variable with QC>5 and the sum of the row is larger than 100:
df_QC <- df %>%
filter_at(vars(contains("QC")), any_vars(. >= 5))%>%
rowwise()%>%
mutate(total = sum(c_across(69:92)))
filter(total >99.9)
Is there a way to solve it the way I'd like?
To reiterate, I want to select species if QC> 5 for at least 4 of the QC variables.
My data is a bit large so let's work on a smaller dataset to replicate the problem. iris data but add some quality control variables:
df <-cbind(iris, data.frame(qc1 = sample(0:10, size=150, replace=TRUE),
qc2 = sample(0:10, size=150, replace=TRUE),
qc3 = sample(0:10, size=150, replace=TRUE),
qc4 = sample(0:10, size=150, replace=TRUE),
qc5 = sample(0:10, size=150, replace=TRUE)))
#Then similarly I would do a filtering that is not really what I want:
df_QC <- df %>%
filter_at(vars(contains("QC")), any_vars(. >= 5))%>%
rowwise()%>%
mutate(total = sum(c_across(6:10)))%>%
filter(total >15)
So with this example data, how could I filter out Species with QC>5 for at least 3 QCs?
Thanks in advance for any help!
CodePudding user response:
Here's a way to do it using rowSums
. Basically counting the number of values >= 5 for every row, and checking if this number is higher or equal to 4.
dplyr
library(dplyr)
df %>%
filter(rowSums(.[str_c("qc", 1:5)] >= 5) >= 4)
base R
df[rowSums(df[paste0("qc", 1:5)] >= 5) >= 4, ]
output
Sepal.Length Sepal.Width Petal.Length Petal.Width Species qc1 qc2 qc3 qc4 qc5
1 4.8 3.4 1.6 0.2 setosa 6 9 5 10 10
2 5.8 4.0 1.2 0.2 setosa 9 8 3 7 5
3 5.7 3.8 1.7 0.3 setosa 4 10 9 7 7
4 5.1 3.3 1.7 0.5 setosa 7 6 10 1 7
5 4.8 3.4 1.9 0.2 setosa 1 6 6 7 7
6 5.2 3.5 1.5 0.2 setosa 5 1 6 6 7
7 5.5 3.5 1.3 0.2 setosa 10 3 6 6 6
8 4.9 3.6 1.4 0.1 setosa 8 4 10 8 6
9 4.4 3.0 1.3 0.2 setosa 1 8 5 6 10
10 5.0 3.5 1.3 0.3 setosa 9 9 5 9 1
...