Filter a dataframe by column, depending on the value of another column-CodePudding

I would like to filter something similar to the following dataframe:

A    B    C    D    E
1    0    0.2  2    5
1    0    0.5  5    0
1    7    0.3  4    4
0    2    20   2    2
1    0    0.9  2    1
0    7    17   4    2

How can I remove those rows with values <0.3 for column C if A==1 and those rows with values <20 for column C if A==0?

This would be my expected output:

A    B    C    D    E
1    0    0.5  5    0
1    7    0.3  4    4
0    2    20   2    2
1    0    0.9  2    1

I've tryed:

library(dplyr)
test<-df %>% {if (A == '1') filter(.,C < 0.3) else filter(.,C < 20)}

But I am not achieving the expected results.

Thanks!

CodePudding user response：

You were close,

& stands for and | stands for or

library(dplyr)

test <- df %>% filter(!(A==1 & C < 0.3 | A==0 & C < 20))

CodePudding user response：

df %>% 
  filter(!((C < .3 & A == 1) | (C < 20 & A == 0)))

  A B    C D E
1 1 0  0.5 5 0
2 1 7  0.3 4 4
3 0 2 20.0 2 2
4 1 0  0.9 2 1

CodePudding user response：

To give a bit more explanation to what the other users have posted we can test each of the rows with one of your criteria at a time:

> testdata$C < 0.3
[1]  TRUE FALSE FALSE FALSE FALSE FALSE
> testdata$A == 1
[1]  TRUE  TRUE  TRUE FALSE  TRUE FALSE

Then we can combine them into the first search using "and" (&)

> testdata$C < 0.3 & testdata$A == 1
[1]  TRUE FALSE FALSE FALSE FALSE FALSE

Then negate that using ! (since we want to drop these rows, not keep them):

> !(testdata$C < 0.3 & testdata$A == 1)
[1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE

Note I have added brackets to show that we are negating after combining the criteria.

Similarly for the second set:

> testdata$C < 20
[1]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE
> testdata$A == 0
[1] FALSE FALSE FALSE  TRUE FALSE  TRUE
> !(testdata$C < 20 & testdata$A == 0)
[1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE

We then combine them into a single test using "or" (|):

> !((testdata$C < 0.3 & testdata$A == 1) | (testdata$C < 20 & testdata$A == 0))
[1] FALSE  TRUE  TRUE  TRUE  TRUE FALSE

And applying this logic to the filter() command:

> testdata %>% filter(!((C < 0.3 & A == 1) | (C < 20 & A == 0)))
  A B    C D E
1 1 0  0.5 5 0
2 1 7  0.3 4 4
3 0 2 20.0 2 2
4 1 0  0.9 2 1