Home > database >  Merging two data frames on specific values
Merging two data frames on specific values

Time:02-16

I have two dataframes I am trying to merge but only keep unique pairs of a specific value. In this case it is apple. These are my df's



df1 %>% head()
  values    pair_num
   <ch>     <int>
1  apple      1         
2  pb         1          
3  apple      2          
4  ranch      2          
5  apple      3          
6  sauce      3 
7  apple      4         
8  juice      4          
9  apple      5          
10 cider      5    
      

The second contains a huge list of scores for each value at different times:


df2 %>% head()
  values    score     time
   <ch>     <int>     <ch>
1  apple      44     breakfast
2  apple      23     lunch     
3  apple      17     dinner     
4  apple      12     snack     
5  orange     5      breakfast
6  orange     88     lunch
7  orange     29     dinner     
8  orange     53     snack 
9  sauce      7      breakfast
10 sauce      99     dinner   
11 grape      17     dinner     
12 grape      12     snack     
13 grape      5      breakfast
14 grape      88     lunch
15 juice      29     breakfast     
16 juice      53     snack 
17 cider      27     lunch 
18 cider      11     breakfast

I am trying to create a third dataframe where each pair is checked against a master list containing the scores of each individual value, however some of the values may or may not always appear more than once depending on the time. Ideally my third data frame would select only the complete pairs with respective scores at the times they appeared and would look like this:

df3 %>% head()
  values   pair_num  score    time
   <ch>     <int>    <int>    <ch>
1  apple      3       44     breakfast
2  sauce      3       7      breakfast    
3  apple      3       17     dinner     
4  sauce      3       99     dinner     
5  apple      4       44     breakfast
6  juice      4       29     breakfast
7  apple      4       12     snack     
8  juice      4       53     snack 
9  apple      5       44     breakfast
10 cider      5       11     breakfast
11 apple      5       23     lunch
12 cider      5       27     lunch

I have tried to merge and perform joins on the data frames while keeping only unique pair values but I keep getting back a bunch of incomplete pairs and values I don't want. My goal is to end up with a df so I can facet the plots (~time) and view the relationship between apples and its partners individually.

CodePudding user response:

If I understand you correctly, you want to end up with a merge between df1 and df2, but where missing pairs are removed?

With dplyr, you can it like this:

library(dplyr, warn.conflicts = FALSE)

df1 %>% 

  # join the values
  left_join(df2) %>% 

  # filter out missing values
  filter(!is.na(score)) %>% 

  # group by pair and time
  group_by(pair_num, time) %>% 

  # filter out missing pairs
  filter(n() > 1) %>% 

  # arrange to show results better
  arrange(pair_num, time)

#> Joining, by = "values"
#> # A tibble: 12 x 4
#> # Groups:   pair_num, time [6]
#>    values pair_num score time     
#>    <chr>     <dbl> <dbl> <chr>    
#>  1 apple         3    44 breakfast
#>  2 sauce         3     7 breakfast
#>  3 apple         3    17 dinner   
#>  4 sauce         3    99 dinner   
#>  5 apple         4    44 breakfast
#>  6 juice         4    29 breakfast
#>  7 apple         4    12 snack    
#>  8 juice         4    53 snack    
#>  9 apple         5    44 breakfast
#> 10 cider         5    11 breakfast
#> 11 apple         5    23 lunch    
#> 12 cider         5    27 lunch

Data

df1 <- tribble(
  ~values,    ~pair_num,
  "apple",      1,         
  "pb",         1,          
  "apple",      2,          
  "ranch",      2,          
  "apple",      3,          
  "sauce",      3, 
  "apple",      4,         
  "juice",      4,          
  "apple",      5,          
  "cider",      5    
)


df2 <- tribble(
  ~values,    ~score,     ~time,
  "apple",      44,     "breakfast",
  "apple",      23,     "lunch",     
  "apple",      17,     "dinner",     
  "apple",      12,     "snack",     
  "orange",     5,      "breakfast",
  "orange",     88,     "lunch",
  "orange",     29,     "dinner",     
  "orange",     53,     "snack", 
  "sauce",      7,      "breakfast",
  "sauce",      99,     "dinner",   
  "grape",      17,     "dinner",     
  "grape",      12,     "snack",     
  "grape",      5,      "breakfast",
  "grape",      88,     "lunch",
  "juice",      29,     "breakfast",     
  "juice",      53,     "snack", 
  "cider",      27,     "lunch", 
  "cider",      11,     "breakfast"
)
  •  Tags:  
  • r
  • Related