How can I search in a data frame if all possible combinations exist in another data frame in R using-CodePudding

I have two data frames.

The first one that contains all the possible combinations with their corresponding values and looks like this:

first	second	val
A	B	10
A	C	20
A	D	30
B	C	40
B	D	50
C	D	60
H	I	70

The second one that comes from the production line has two columns the date column that has grouped all the variables corresponding to their date and are concatenated:

date	var
2022-01-01	A
2022-02-01	B,C,F,E,G,H,I

I want to find all the combinations in the second data frame and to see if they match with any combinations in the first data frame. If a variable stands alone in the second data frame as A in 2022-01-01 to give me the 0 and otherwise the value of the combination.

Ideally I want the resulting data frame to look like this:

date	comb	val
2022-01-01	A	0
2022-02-01	B,C	40
2022-02-01	H,I	70

How can I do this in R using dplyr?

library(tidyverse)
first = c("A","A","A","B","B","C","H")
second = c("B","C","D","C","D","D","I")
val = c(10,20,30,40,50,60,70)
df1 = tibble(first,second,val);df1
date = c(as.Date("2022-01-01"),as.Date("2022-02-01"))
var = c("A","B,C,F,E,G,H,I")
df2  = tibble(date,var);df2

CodePudding user response：

Using tidyverse:

library(tidyverse)
first = c("A","A","A","B","B","C","H")
second = c("B","C","D","C","D","D","I")
val = c(10,20,30,40,50,60,70)
df1 = tibble(first,second,val);df1
date = c(as.Date("2022-01-01"),as.Date("2022-02-01"))
var = c("A","B,C,F,E,G,H,I")
df2  = tibble(date,var);df2 

df2_tidy <- df2 %>%
  mutate(first = str_split(var, ","),
         second = first) %>%
  unnest(first) %>%
  unnest(second) %>%
  select(-var)

singles <- df2 %>%
  filter(!str_detect(var, ",")) %>%
  mutate(val = 0) %>%
  select(date, comb = var, val)

combs <- df1 %>%
  inner_join(df2_tidy, by = c("first", "second")) %>%
  mutate(comb = paste(first, second, sep = ",")) %>%
  select(date, comb, val)

bind_rows(singles, combs)