I have a data frame with a given number of columns, say 5 for example. I have a condition for each of the columns and want to select the rows which match 4 out of 5 conditions.
For a simple example imagine I wanted the rows where the value for at least 3 of columns A to E is greater than 1.
I know how to filter using tidyverse for specific conditions i.e if column A > 1 and column B < 5, but not sure how to filter for rows that meet some but not all of the conditions that I set. Perhaps a rather simple question but I can't find an immediate answer online and am under a bit of time pressure. I am very much a beginner so if possible keep explanations as simple as possible. Thanks!
CodePudding user response:
As boolean values can be turned into 0 or 1 (numeric), you can add together your 5 conditions and check if that sum is greater than 5:
df = as_tibble(replicate(5, sample(1:10)))
df %>%
mutate(cond = (V1>5) (V2>2) (V3<4) (V4>7) (V5<2)) %>%
filter(cond >= 4)
# A tibble: 3 x 6
V1 V2 V3 V4 V5 cond
<int> <int> <int> <int> <int> <int>
1 9 10 3 8 3 4
2 1 7 1 10 6 3
3 10 8 2 9 2 4
Obs: you can do it in once, I just separated it so you can see the sum column.
df %>% filter((V1>5) (V2>2) (V3<4) (V4>7) (V5<2) >= 4)