I have 2 data frames of a single column
data frame A:
var |
---|
A |
B |
and data frame B:
var |
---|
A |
C |
library(tidyverse)
A = tibble(var = c("A","B"))
B = tibble(var = c("A","C"))
I want to find all the possible combinations of the elements in the var column in a new data frame that must look like this :
varA | varB |
---|---|
A | A |
A | B |
A | C |
B | C |
Now from this resulted data frame I want to find this combinations in another data frame no matter the position (first or second,varA - varB) of the new data frame and give me the column y with the corresponded values.
varA | varB | y |
---|---|---|
B | A | 1.12 |
A | A | 1.14 |
C | A | 1.15 |
B | C | 1.16 |
D | F | 1.23 |
H | G | 1.1 |
D = tibble(varA = c("B","A","C","B","D","H"),
varB = c("A","A","A","C","F","G"),
y = c(1.12,1.14,1.15,1.16,1.23,1.10));D
ideally I want to look like this :
varA | varB | y |
---|---|---|
A | A | 1.14 |
A | B | 1.12 |
A | C | 1.15 |
B | C | 1.16 |
How can I do this in R using dplyr package or base functions ?
CodePudding user response:
Here is my attempt.
library(tidyverse)
A = tibble(var = c("A","B"))
B = tibble(var = c("A","C"))
# create data frame of all combinations
B <- expand.grid(A$var, B$var)
D <- tibble(varA = c("B","A","C","B","D","H"),
varB = c("A","A","A","C","F","G"),
y = c(1.12,1.14,1.15,1.16,1.23,1.10));D
# create new column and sort
strSort <- function(x)
sapply(lapply(strsplit(x, NULL), sort), paste, collapse = "")
D <- D %>% mutate(var_new = strSort(paste(varA, varB)))
B <- B %>% mutate(var_new = strSort(paste(Var1, Var2)))
left_join(B, D) %>% select(Var1, Var2, y)
I use expand.grid to create all possible combination, and then I create a new column for both data frames with the sorted string (sorting function), to make sure the order does not matter, then it is a simple left join and cleaning the resulting data frame.