How can I merge two data frames and extract the unique combinations of columns?-CodePudding

I have two data frames. The first one which is the reference df and looks like this :

var1	var2
a	e
b	z
c	f
d	h

and the second one that is my universe:

sym1	sym2
e	a
b	f
b	z
f	c
n	s
n	k
k	l

I want to merge them and take only the unique combinations that exist in reference dataframe from all the possible pair in the universe.

Ideally I want the reported data frame to look like this :

var1	var2
a	e
b	z
c	f
NA	NA

library(tidyverse)
var1 = c("a","b","c","d")
var2 = c("e","z","f","h")
ref = tibble(var1,var2);ref
sym1 = c("e","b","b","f","n","n","k")
sym2 = c("a","f","z","c","s","k","l")
univ = tibble(sym1,sym2);univ

How can I do this in R using dplyr ?

CodePudding user response：

In base R:

s <- apply(univ, 1, \(x) paste(sort(x), collapse = " "))
r <- paste(ref$var1, ref$var2)
univ[match(r, s), ]

  sym1  sym2 
1 e     a    
2 b     z    
3 f     c    
4 NA    NA

In a single tidyverse-friendly pipe:

library(stringr)
library(dplyr)
univ %>% 
  rowwise() %>% 
  mutate(s = str_c(sort(c_across(everything())), collapse = "")) %>% 
  pull(s) %>% 
  match(str_c(ref$var1, ref$var2), .) %>% 
  univ[., ]

CodePudding user response：

A base R one-liner

ref[is.na(do.call(match, lapply(list(unname(ref), unname(univ)), apply, 1, sort, simplify = FALSE))), ] <- NA

gives

> ref
# A tibble: 4 × 2
  var1  var2
  <chr> <chr>
1 a     e
2 b     z
3 c     f
4 NA    NA