I have two data frames. The first one which is the reference df and looks like this :
var1 | var2 |
---|---|
a | e |
b | z |
c | f |
d | h |
and the second one that is my universe:
sym1 | sym2 |
---|---|
e | a |
b | f |
b | z |
f | c |
n | s |
n | k |
k | l |
I want to merge them and take only the unique combinations that exist in reference dataframe from all the possible pair in the universe.
Ideally I want the reported data frame to look like this :
var1 | var2 |
---|---|
a | e |
b | z |
c | f |
NA | NA |
library(tidyverse)
var1 = c("a","b","c","d")
var2 = c("e","z","f","h")
ref = tibble(var1,var2);ref
sym1 = c("e","b","b","f","n","n","k")
sym2 = c("a","f","z","c","s","k","l")
univ = tibble(sym1,sym2);univ
How can I do this in R using dplyr ?
CodePudding user response:
In base R:
s <- apply(univ, 1, \(x) paste(sort(x), collapse = " "))
r <- paste(ref$var1, ref$var2)
univ[match(r, s), ]
sym1 sym2
1 e a
2 b z
3 f c
4 NA NA
In a single tidyverse-friendly pipe:
library(stringr)
library(dplyr)
univ %>%
rowwise() %>%
mutate(s = str_c(sort(c_across(everything())), collapse = "")) %>%
pull(s) %>%
match(str_c(ref$var1, ref$var2), .) %>%
univ[., ]
CodePudding user response:
A base R one-liner
ref[is.na(do.call(match, lapply(list(unname(ref), unname(univ)), apply, 1, sort, simplify = FALSE))), ] <- NA
gives
> ref
# A tibble: 4 × 2
var1 var2
<chr> <chr>
1 a e
2 b z
3 c f
4 NA NA