Home > Software design >  How can I search in a data frame if all possible combinations exist in another data frame in R using
How can I search in a data frame if all possible combinations exist in another data frame in R using

Time:11-13

I have two data frames.

The first one that contains all the possible combinations with their corresponding values and looks like this:

first second val
A B 10
A C 20
A D 30
B C 40
B D 50
C D 60
H I 70

The second one that comes from the production line has two columns the date column that has grouped all the variables corresponding to their date and are concatenated:

date var
2022-01-01 A
2022-02-01 B,C,F,E,G,H,I

I want to find all the combinations in the second data frame and to see if they match with any combinations in the first data frame. If a variable stands alone in the second data frame as A in 2022-01-01 to give me the 0 and otherwise the value of the combination.

Ideally I want the resulting data frame to look like this:

date comb val
2022-01-01 A 0
2022-02-01 B,C 40
2022-02-01 H,I 70

How can I do this in R using dplyr?

library(tidyverse)
first = c("A","A","A","B","B","C","H")
second = c("B","C","D","C","D","D","I")
val = c(10,20,30,40,50,60,70)
df1 = tibble(first,second,val);df1
date = c(as.Date("2022-01-01"),as.Date("2022-02-01"))
var = c("A","B,C,F,E,G,H,I")
df2  = tibble(date,var);df2

CodePudding user response:

Using tidyverse:

library(tidyverse)
first = c("A","A","A","B","B","C","H")
second = c("B","C","D","C","D","D","I")
val = c(10,20,30,40,50,60,70)
df1 = tibble(first,second,val);df1
date = c(as.Date("2022-01-01"),as.Date("2022-02-01"))
var = c("A","B,C,F,E,G,H,I")
df2  = tibble(date,var);df2 

df2_tidy <- df2 %>%
  mutate(first = str_split(var, ","),
         second = first) %>%
  unnest(first) %>%
  unnest(second) %>%
  select(-var)

singles <- df2 %>%
  filter(!str_detect(var, ",")) %>%
  mutate(val = 0) %>%
  select(date, comb = var, val)

combs <- df1 %>%
  inner_join(df2_tidy, by = c("first", "second")) %>%
  mutate(comb = paste(first, second, sep = ",")) %>%
  select(date, comb, val)

bind_rows(singles, combs)
  • Related