I am trying to filter a dataframe in R using dplyr where I want to check on a combination of 2 columns. Lets say I have the following dataframe;
df <- data.frame(val1 = c("a","b","c","s","s","s"),
val2 = c(10, 5, 2, 2, 10, 5))
resulting in the following df;
val1 val2
1 a 10
2 b 5
3 c 2
4 s 2
5 s 10
6 s 5
I want to filter out all rows where val1 == 's' EXCEPT when val2 on the same row is 5 or 10. So the result should look like this;
val1 val2
1 a 10
2 b 5
3 c 2
4 s 10
5 s 5
I want to use the filter function from dplyr because I am using it with other filters in a query on a oracle database. I tried the following lines, but they do not work for they either filter out all rows where val1 == "s" or all rows where val2 == 5 or val2 == 10 (so in this example, including the rows where val1 == "a" or val1 == "b";
filter(val1 == "s" & val2 == 10 | val2 == 5)
filter(val1 == "s" && val2 == 10 | val2 == 5)
filter(val1 == "s" & (val2 == 10 | val2 == 5))
filter(val1 == "s" && (val2 == 10 | val2 == 5))
I tried looking up similar questions but they do not seem to answer my problem. I make it work by combining val1 and val2 as val3, and then I filter out all rows where val3 == "s5" or val3=="s10", but as I said, I intend to use the filter in a query to a database. Therefore, I would prefer to filter as much as possible in the query itself because my actual dataframe consists of hundreds of thousands of rows and 50 columns.
I do not think it would be hard, but I just can't seem to get it. Any help will be greatly appreciated!
CodePudding user response:
This works:
df %>% filter(!(val1 == 's' & !val2 %in% c(10, 5)))
Or equivalently:
df %>% filter((val1 != 's' | val2 %in% c(10, 5)))
output:
val1 val2
1 a 10
2 b 5
3 c 2
4 s 10
5 s 5
CodePudding user response:
I want to filter out all rows where val1 == 's' EXCEPT when val2 on the same row is 5 or 10.
filter
's conditions are for the rows to keep, so I would re-frame this as "I want to keep rows where val1 != 's'
or where val2
is 5 or 10":
And generally it's nicer to use %in%
as in x %in% c(1, 2, 3)
rather than x == 1 | x == 2 | x == 3
.
df %>%
filter(val1 != 's' | val2 %in% c(5, 10))
# val1 val2
# 1 a 10
# 2 b 5
# 3 c 2
# 4 s 10
# 5 s 5