I would like to filter something similar to the following dataframe:
A B C D E
1 0 0.2 2 5
1 0 0.5 5 0
1 7 0.3 4 4
0 2 20 2 2
1 0 0.9 2 1
0 7 17 4 2
How can I remove those rows with values <0.3
for column C
if A==1
and those rows with values <20
for column C
if A==0
?
This would be my expected output:
A B C D E
1 0 0.5 5 0
1 7 0.3 4 4
0 2 20 2 2
1 0 0.9 2 1
I've tryed:
library(dplyr)
test<-df %>% {if (A == '1') filter(.,C < 0.3) else filter(.,C < 20)}
But I am not achieving the expected results.
Thanks!
CodePudding user response:
You were close,
& stands for and | stands for or
library(dplyr)
test <- df %>% filter(!(A==1 & C < 0.3 | A==0 & C < 20))
CodePudding user response:
df %>%
filter(!((C < .3 & A == 1) | (C < 20 & A == 0)))
A B C D E
1 1 0 0.5 5 0
2 1 7 0.3 4 4
3 0 2 20.0 2 2
4 1 0 0.9 2 1
CodePudding user response:
To give a bit more explanation to what the other users have posted we can test each of the rows with one of your criteria at a time:
> testdata$C < 0.3
[1] TRUE FALSE FALSE FALSE FALSE FALSE
> testdata$A == 1
[1] TRUE TRUE TRUE FALSE TRUE FALSE
Then we can combine them into the first search using "and" (&
)
> testdata$C < 0.3 & testdata$A == 1
[1] TRUE FALSE FALSE FALSE FALSE FALSE
Then negate that using !
(since we want to drop these rows, not keep them):
> !(testdata$C < 0.3 & testdata$A == 1)
[1] FALSE TRUE TRUE TRUE TRUE TRUE
Note I have added brackets to show that we are negating after combining the criteria.
Similarly for the second set:
> testdata$C < 20
[1] TRUE TRUE TRUE FALSE TRUE TRUE
> testdata$A == 0
[1] FALSE FALSE FALSE TRUE FALSE TRUE
> !(testdata$C < 20 & testdata$A == 0)
[1] TRUE TRUE TRUE TRUE TRUE FALSE
We then combine them into a single test using "or" (|
):
> !((testdata$C < 0.3 & testdata$A == 1) | (testdata$C < 20 & testdata$A == 0))
[1] FALSE TRUE TRUE TRUE TRUE FALSE
And applying this logic to the filter()
command:
> testdata %>% filter(!((C < 0.3 & A == 1) | (C < 20 & A == 0)))
A B C D E
1 1 0 0.5 5 0
2 1 7 0.3 4 4
3 0 2 20.0 2 2
4 1 0 0.9 2 1