How can I cross check a data frame if all possible combinations on it exist in another data frame of-CodePudding

I have two data frames.

The first one that contain all the possible combinations with their corresponding value and looks like this :

first	second	val
Alpha	Beta	10
Alpha	Corn	20
Alpha	Desk	30
Beta	Corn	40
Betea	Desk	50
Corn	Desk	60
Hat	Ian	70

The second one that comes from the production line has two columns the date column that has grouped all the variables corresponding to their date and are concatenated :

date	var
2022-01-01	A
2022-02-01	Beta,Corn,Fanta,Epsilon,George,Hat,Ian

I want to find all the combinations in the second data frame and to see if they match with any combinations in the first data frame.If a variable stands alone in the second data frame as Alpha in 2022-01-01 to give me the 0 and otherwise the value of the combination.

Ideally I want the resulted data frame to look like this :

date	comb	val
2022-01-01	Alpha	0
2022-02-01	Beta,Corn	40
2022-02-01	Hat,Ian	70

How can I do this in R using dplyr ?



library(tidyverse)
first = c("Alpha","Alpha","Alpha","Beta","Beta","Corn","Hat")
second = c("Beta","Corn","Desk","Corn","Desk","Desk","Ian")
val = c(10,20,30,40,50,60,70)
df1 = tibble(first,second,val);df1
date = c(as.Date("2022-01-01"),as.Date("2022-02-01"))
var = c("Alpha","Beta,Corn,Fanta,Epsilon,George,Hat,Ian")
df2  = tibble(date,var);df2

CodePudding user response：

An option is to split the rows of the 'df2' based on the 'var' column delimiter , with separate_rows, grouped by 'date', do a combnation of the 'var's create the first, second columns in a tibble from the pairwise combinations, unnest the list columns, do a join with the 'df1' dataset, and another join with 'df2' (in case some dates are lost because of no matches), and unite the 'first', 'second' to create the combn after coalesceing the 'first' with the 'var'

library(dplyr)
library(tidyr)
df2 %>% 
  separate_rows(var) %>%
  group_by(date) %>% 
  summarise(var = if(length(var) > 1) list(combn(var, 2,  \(x) 
   tibble(first = x[1], second = x[2]), simplify = FALSE) %>% 
    bind_rows) else 
  list(tibble(first = var, second = var)) ) %>% 
 unnest(var) %>%
 inner_join(df1, by = c("first", "second")) %>% 
 full_join(df2, by = "date") %>% 
 mutate(first = coalesce(first, var)) %>%
  unite(combn, first, second, sep = ", ") %>% 
 select(-var)

-output

# A tibble: 3 × 3
  date       combn        val
  <date>     <chr>      <dbl>
1 2022-02-01 Beta, Corn    40
2 2022-02-01 Hat, Ian      70
3 2022-01-01 Alpha, NA     NA