I have two data frames.
The first one that contain all the possible combinations with their corresponding value and looks like this :
first | second | val |
---|---|---|
Alpha | Beta | 10 |
Alpha | Corn | 20 |
Alpha | Desk | 30 |
Beta | Corn | 40 |
Betea | Desk | 50 |
Corn | Desk | 60 |
Hat | Ian | 70 |
The second one that comes from the production line has two columns the date column that has grouped all the variables corresponding to their date and are concatenated :
date | var |
---|---|
2022-01-01 | A |
2022-02-01 | Beta,Corn,Fanta,Epsilon,George,Hat,Ian |
I want to find all the combinations in the second data frame and to see if they match with any combinations in the first data frame.If a variable stands alone in the second data frame as Alpha in 2022-01-01 to give me the 0 and otherwise the value of the combination.
Ideally I want the resulted data frame to look like this :
date | comb | val |
---|---|---|
2022-01-01 | Alpha | 0 |
2022-02-01 | Beta,Corn | 40 |
2022-02-01 | Hat,Ian | 70 |
How can I do this in R using dplyr ?
library(tidyverse)
first = c("Alpha","Alpha","Alpha","Beta","Beta","Corn","Hat")
second = c("Beta","Corn","Desk","Corn","Desk","Desk","Ian")
val = c(10,20,30,40,50,60,70)
df1 = tibble(first,second,val);df1
date = c(as.Date("2022-01-01"),as.Date("2022-02-01"))
var = c("Alpha","Beta,Corn,Fanta,Epsilon,George,Hat,Ian")
df2 = tibble(date,var);df2
CodePudding user response:
An option is to split the rows of the 'df2' based on the 'var' column delimiter ,
with separate_rows
, grouped by 'date', do a combn
ation of the 'var's create the first
, second
columns in a tibble
from the pairwise combinations, unnest
the list
columns, do a join with the 'df1' dataset, and another join with 'df2' (in case some dates are lost because of no matches), and unite
the 'first', 'second' to create the combn
after coalesce
ing the 'first' with the 'var'
library(dplyr)
library(tidyr)
df2 %>%
separate_rows(var) %>%
group_by(date) %>%
summarise(var = if(length(var) > 1) list(combn(var, 2, \(x)
tibble(first = x[1], second = x[2]), simplify = FALSE) %>%
bind_rows) else
list(tibble(first = var, second = var)) ) %>%
unnest(var) %>%
inner_join(df1, by = c("first", "second")) %>%
full_join(df2, by = "date") %>%
mutate(first = coalesce(first, var)) %>%
unite(combn, first, second, sep = ", ") %>%
select(-var)
-output
# A tibble: 3 × 3
date combn val
<date> <chr> <dbl>
1 2022-02-01 Beta, Corn 40
2 2022-02-01 Hat, Ian 70
3 2022-01-01 Alpha, NA NA