I'm sure there is a better way to ask this question, and perhaps if I knew it I'd have found an existing answer. The gist is this:
- I'm reading in SQL data with R with NEW, UPDATE, and DELETE rows
- table uniqueness is determined by values in two columns (call them
a
andb
) - I want (a) find all combinations of
a
andb
in which columnc
has a DELETE value, and (b) then remove all rows with one of thesea
andb
combinations
Here's a reproducible example:
library(dplyr)
df <- data.frame(
a = c(1, 1, 1, 2, 2),
b = c(3, 3, 4, 5, 5),
c = c("new", "delete", "new", "new", "update")
)
What I tried, not thinking this through:
df %>%
mutate(filter_key = paste0(a, "_", b)) %>%
filter(!(filter_key %in% (df %>%
filter(c == "delete") %>%
pull(filter_key) %>%
unique())))
The problem is that filter_key
is not in df
, so it's not there to pull unique values for. I'm curious if there is a way to refer to it "on the fly"? Some answers led me to think I could perhaps use .
or .data
, but this didn't work for me (replacing df
with .
or .data
).
The trivial solution is to add it beforehand:
df <- df %>% mutate(filter_key = paste0(a, "_", b))
df %>%
filter(!(filter_key %in% (df %>%
filter(c == "delete") %>%
pull(filter_key) %>%
unique())))
Alternatively (though this changes the question, so I'll leave it in these "notes"), there's probably a way to just filter directly, but I wasn't sure how to say:
- give me all the combos of
a
andb
where any row hasc == "delete"
- remove all rows where
a
andb
match one of these combos
CodePudding user response:
You can do:
library(tidyverse)
df %>%
mutate(both = str_c(a, b)) %>%
filter(!both %in% (cur_data() %>%
filter(c == 'delete') %>%
pull(both)))
a b c both
1 1 4 new 14
2 2 5 new 25
3 2 5 update 25
CodePudding user response:
Use an anti-join:
df %>% anti_join(filter(., c == "delete"), by = c("a", "b"))
giving
a b c
1 1 4 new
2 2 5 new
3 2 5 update