How to know if its findable a value thats in a column in another column?-CodePudding

I have two large databases, but I want to know how much of the data its already in the other, those values are random in the rows, so I need a filter that indícates what of the values in a df1$column are in the df2$column.

For example, lets Think this like vectors. I have two vectors

a=c("q","w","e","r","t","y","u","i","o")
b=c("o","u","y","t","r","e","w","q","a")

I want an output thats says the ones from b that are NOT in a, for example

>"i"

Hope this is understandable

CodePudding user response：

You can also simply use %in% like this:

a=c("q","w","e","r","t","y","u","i","o")
b=c("o","u","y","t","r","e","w","q","a")
a[!(a %in% b)]
#> [1] "i"

^{Created on 2022-07-11 by the reprex package (v2.0.1)}

before update question

If you want to find the opposite of intersection (symmetric difference), you can use the following code:

a=c("q","w","e","r","t","y","u","i","o")
b=c("o","u","y","t","r","e","w","q","a")
setdiff(union(b,a), intersect(b,a))
#> [1] "a" "i"

^{Created on 2022-07-11 by the reprex package (v2.0.1)}

CodePudding user response：

Update: after updating the question:

Now the function is shorter!:-)

library(dplyr)

my_function <- function(a, b){
  from_a_not_in_b <- anti_join(data.frame(a), data.frame(b), by= c("a"="b")) %>% 
    pull(a)
  return(from_a_not_in_b)
  }

my_function(a,b)

[1] "i"

First answer: This custom function does the same as @akrun's c( setdiff(b, a), setdiff(a,b)), but is somehow clunky. Anyway in order to train functional programming we could do:

library(dplyr)

my_function <- function(a, b){
  from_b_not_in_a <- anti_join(data.frame(b), data.frame(a), by= c("b"="a")) %>% 
    pull(b)
  
  from_a_not_in_b <- anti_join(data.frame(a), data.frame(b), by= c("a"="b")) %>% 
    pull(a)
  
  c( from_b_not_in_a, from_a_not_in_b)
}

my_function(a,b)

[1] "a" "i"

CodePudding user response：

We might not want to forget: a[!(a %in% b)]

CodePudding user response：

From b but not in a: setdiff(b, a).

From a but not in b: setdiff(a, b).