Home > Software design >  How can I cross check a data frame if all possible combinations on it exist in another data frame of
How can I cross check a data frame if all possible combinations on it exist in another data frame of

Time:11-13

I have two data frames.

The first one that contain all the possible combinations with their corresponding value and looks like this :

first second val
Alpha Beta 10
Alpha Corn 20
Alpha Desk 30
Beta Corn 40
Betea Desk 50
Corn Desk 60
Hat Ian 70

The second one that comes from the production line has two columns the date column that has grouped all the variables corresponding to their date and are concatenated :

date var
2022-01-01 A
2022-02-01 Beta,Corn,Fanta,Epsilon,George,Hat,Ian

I want to find all the combinations in the second data frame and to see if they match with any combinations in the first data frame.If a variable stands alone in the second data frame as Alpha in 2022-01-01 to give me the 0 and otherwise the value of the combination.

Ideally I want the resulted data frame to look like this :

date comb val
2022-01-01 Alpha 0
2022-02-01 Beta,Corn 40
2022-02-01 Hat,Ian 70

How can I do this in R using dplyr ?



library(tidyverse)
first = c("Alpha","Alpha","Alpha","Beta","Beta","Corn","Hat")
second = c("Beta","Corn","Desk","Corn","Desk","Desk","Ian")
val = c(10,20,30,40,50,60,70)
df1 = tibble(first,second,val);df1
date = c(as.Date("2022-01-01"),as.Date("2022-02-01"))
var = c("Alpha","Beta,Corn,Fanta,Epsilon,George,Hat,Ian")
df2  = tibble(date,var);df2 



CodePudding user response:

An option is to split the rows of the 'df2' based on the 'var' column delimiter , with separate_rows, grouped by 'date', do a combnation of the 'var's create the first, second columns in a tibble from the pairwise combinations, unnest the list columns, do a join with the 'df1' dataset, and another join with 'df2' (in case some dates are lost because of no matches), and unite the 'first', 'second' to create the combn after coalesceing the 'first' with the 'var'

library(dplyr)
library(tidyr)
df2 %>% 
  separate_rows(var) %>%
  group_by(date) %>% 
  summarise(var = if(length(var) > 1) list(combn(var, 2,  \(x) 
   tibble(first = x[1], second = x[2]), simplify = FALSE) %>% 
    bind_rows) else 
  list(tibble(first = var, second = var)) ) %>% 
 unnest(var) %>%
 inner_join(df1, by = c("first", "second")) %>% 
 full_join(df2, by = "date") %>% 
 mutate(first = coalesce(first, var)) %>%
  unite(combn, first, second, sep = ", ") %>% 
 select(-var)

-output

# A tibble: 3 × 3
  date       combn        val
  <date>     <chr>      <dbl>
1 2022-02-01 Beta, Corn    40
2 2022-02-01 Hat, Ian      70
3 2022-01-01 Alpha, NA     NA
  • Related