Home > Software design >  How can I merge two data frames and extract the unique combinations of columns?
How can I merge two data frames and extract the unique combinations of columns?

Time:10-28

I have two data frames. The first one which is the reference df and looks like this :

var1 var2
a e
b z
c f
d h

and the second one that is my universe:

sym1 sym2
e a
b f
b z
f c
n s
n k
k l

I want to merge them and take only the unique combinations that exist in reference dataframe from all the possible pair in the universe.

Ideally I want the reported data frame to look like this :

var1 var2
a e
b z
c f
NA NA
library(tidyverse)
var1 = c("a","b","c","d")
var2 = c("e","z","f","h")
ref = tibble(var1,var2);ref
sym1 = c("e","b","b","f","n","n","k")
sym2 = c("a","f","z","c","s","k","l")
univ = tibble(sym1,sym2);univ

How can I do this in R using dplyr ?

CodePudding user response:

In base R:

s <- apply(univ, 1, \(x) paste(sort(x), collapse = " "))
r <- paste(ref$var1, ref$var2)
univ[match(r, s), ]
  sym1  sym2 
1 e     a    
2 b     z    
3 f     c    
4 NA    NA   

In a single tidyverse-friendly pipe:

library(stringr)
library(dplyr)
univ %>% 
  rowwise() %>% 
  mutate(s = str_c(sort(c_across(everything())), collapse = "")) %>% 
  pull(s) %>% 
  match(str_c(ref$var1, ref$var2), .) %>% 
  univ[., ]

CodePudding user response:

A base R one-liner

ref[is.na(do.call(match, lapply(list(unname(ref), unname(univ)), apply, 1, sort, simplify = FALSE))), ] <- NA

gives

> ref
# A tibble: 4 × 2
  var1  var2
  <chr> <chr>
1 a     e
2 b     z
3 c     f
4 NA    NA
  • Related