I have two data frames.
The first one that contains all the possible combinations with their corresponding values and looks like this:
first | second | val |
---|---|---|
A | B | 10 |
A | C | 20 |
A | D | 30 |
B | C | 40 |
B | D | 50 |
C | D | 60 |
H | I | 70 |
The second one that comes from the production line has two columns the date column that has grouped all the variables corresponding to their date and are concatenated:
date | var |
---|---|
2022-01-01 | A |
2022-02-01 | B,C,F,E,G,H,I |
I want to find all the combinations in the second data frame and to see if they match with any combinations in the first data frame. If a variable stands alone in the second data frame as A in 2022-01-01 to give me the 0 and otherwise the value of the combination.
Ideally I want the resulting data frame to look like this:
date | comb | val |
---|---|---|
2022-01-01 | A | 0 |
2022-02-01 | B,C | 40 |
2022-02-01 | H,I | 70 |
How can I do this in R using dplyr?
library(tidyverse)
first = c("A","A","A","B","B","C","H")
second = c("B","C","D","C","D","D","I")
val = c(10,20,30,40,50,60,70)
df1 = tibble(first,second,val);df1
date = c(as.Date("2022-01-01"),as.Date("2022-02-01"))
var = c("A","B,C,F,E,G,H,I")
df2 = tibble(date,var);df2
CodePudding user response:
Using tidyverse
:
library(tidyverse)
first = c("A","A","A","B","B","C","H")
second = c("B","C","D","C","D","D","I")
val = c(10,20,30,40,50,60,70)
df1 = tibble(first,second,val);df1
date = c(as.Date("2022-01-01"),as.Date("2022-02-01"))
var = c("A","B,C,F,E,G,H,I")
df2 = tibble(date,var);df2
df2_tidy <- df2 %>%
mutate(first = str_split(var, ","),
second = first) %>%
unnest(first) %>%
unnest(second) %>%
select(-var)
singles <- df2 %>%
filter(!str_detect(var, ",")) %>%
mutate(val = 0) %>%
select(date, comb = var, val)
combs <- df1 %>%
inner_join(df2_tidy, by = c("first", "second")) %>%
mutate(comb = paste(first, second, sep = ",")) %>%
select(date, comb, val)
bind_rows(singles, combs)