I have two large databases, but I want to know how much of the data its already in the other, those values are random in the rows, so I need a filter that indícates what of the values in a df1$column are in the df2$column.
For example, lets Think this like vectors. I have two vectors
a=c("q","w","e","r","t","y","u","i","o")
b=c("o","u","y","t","r","e","w","q","a")
I want an output thats says the ones from b that are NOT in a, for example
>"i"
Hope this is understandable
CodePudding user response:
You can also simply use %in%
like this:
a=c("q","w","e","r","t","y","u","i","o")
b=c("o","u","y","t","r","e","w","q","a")
a[!(a %in% b)]
#> [1] "i"
Created on 2022-07-11 by the reprex package (v2.0.1)
before update question
If you want to find the opposite of intersection (symmetric difference), you can use the following code:
a=c("q","w","e","r","t","y","u","i","o")
b=c("o","u","y","t","r","e","w","q","a")
setdiff(union(b,a), intersect(b,a))
#> [1] "a" "i"
Created on 2022-07-11 by the reprex package (v2.0.1)
CodePudding user response:
Update: after updating the question:
Now the function is shorter!:-)
library(dplyr)
my_function <- function(a, b){
from_a_not_in_b <- anti_join(data.frame(a), data.frame(b), by= c("a"="b")) %>%
pull(a)
return(from_a_not_in_b)
}
my_function(a,b)
[1] "i"
First answer:
This custom function does the same as @akrun's c( setdiff(b, a), setdiff(a,b))
, but is somehow clunky. Anyway in order to train functional programming we could do:
library(dplyr)
my_function <- function(a, b){
from_b_not_in_a <- anti_join(data.frame(b), data.frame(a), by= c("b"="a")) %>%
pull(b)
from_a_not_in_b <- anti_join(data.frame(a), data.frame(b), by= c("a"="b")) %>%
pull(a)
c( from_b_not_in_a, from_a_not_in_b)
}
my_function(a,b)
[1] "a" "i"
CodePudding user response:
We might not want to forget: a[!(a %in% b)]
CodePudding user response:
From b
but not in a
: setdiff(b, a)
.
From a
but not in b
: setdiff(a, b)
.