I have two dataframes I am trying to merge but only keep unique pairs of a specific value. In this case it is apple. These are my df's
df1 %>% head()
values pair_num
<ch> <int>
1 apple 1
2 pb 1
3 apple 2
4 ranch 2
5 apple 3
6 sauce 3
7 apple 4
8 juice 4
9 apple 5
10 cider 5
The second contains a huge list of scores for each value at different times:
df2 %>% head()
values score time
<ch> <int> <ch>
1 apple 44 breakfast
2 apple 23 lunch
3 apple 17 dinner
4 apple 12 snack
5 orange 5 breakfast
6 orange 88 lunch
7 orange 29 dinner
8 orange 53 snack
9 sauce 7 breakfast
10 sauce 99 dinner
11 grape 17 dinner
12 grape 12 snack
13 grape 5 breakfast
14 grape 88 lunch
15 juice 29 breakfast
16 juice 53 snack
17 cider 27 lunch
18 cider 11 breakfast
I am trying to create a third dataframe where each pair is checked against a master list containing the scores of each individual value, however some of the values may or may not always appear more than once depending on the time. Ideally my third data frame would select only the complete pairs with respective scores at the times they appeared and would look like this:
df3 %>% head()
values pair_num score time
<ch> <int> <int> <ch>
1 apple 3 44 breakfast
2 sauce 3 7 breakfast
3 apple 3 17 dinner
4 sauce 3 99 dinner
5 apple 4 44 breakfast
6 juice 4 29 breakfast
7 apple 4 12 snack
8 juice 4 53 snack
9 apple 5 44 breakfast
10 cider 5 11 breakfast
11 apple 5 23 lunch
12 cider 5 27 lunch
I have tried to merge and perform joins on the data frames while keeping only unique pair values but I keep getting back a bunch of incomplete pairs and values I don't want. My goal is to end up with a df so I can facet the plots (~time) and view the relationship between apples and its partners individually.
CodePudding user response:
If I understand you correctly, you want to end up with a merge between df1
and df2
, but where missing pairs are removed?
With dplyr, you can it like this:
library(dplyr, warn.conflicts = FALSE)
df1 %>%
# join the values
left_join(df2) %>%
# filter out missing values
filter(!is.na(score)) %>%
# group by pair and time
group_by(pair_num, time) %>%
# filter out missing pairs
filter(n() > 1) %>%
# arrange to show results better
arrange(pair_num, time)
#> Joining, by = "values"
#> # A tibble: 12 x 4
#> # Groups: pair_num, time [6]
#> values pair_num score time
#> <chr> <dbl> <dbl> <chr>
#> 1 apple 3 44 breakfast
#> 2 sauce 3 7 breakfast
#> 3 apple 3 17 dinner
#> 4 sauce 3 99 dinner
#> 5 apple 4 44 breakfast
#> 6 juice 4 29 breakfast
#> 7 apple 4 12 snack
#> 8 juice 4 53 snack
#> 9 apple 5 44 breakfast
#> 10 cider 5 11 breakfast
#> 11 apple 5 23 lunch
#> 12 cider 5 27 lunch
Data
df1 <- tribble(
~values, ~pair_num,
"apple", 1,
"pb", 1,
"apple", 2,
"ranch", 2,
"apple", 3,
"sauce", 3,
"apple", 4,
"juice", 4,
"apple", 5,
"cider", 5
)
df2 <- tribble(
~values, ~score, ~time,
"apple", 44, "breakfast",
"apple", 23, "lunch",
"apple", 17, "dinner",
"apple", 12, "snack",
"orange", 5, "breakfast",
"orange", 88, "lunch",
"orange", 29, "dinner",
"orange", 53, "snack",
"sauce", 7, "breakfast",
"sauce", 99, "dinner",
"grape", 17, "dinner",
"grape", 12, "snack",
"grape", 5, "breakfast",
"grape", 88, "lunch",
"juice", 29, "breakfast",
"juice", 53, "snack",
"cider", 27, "lunch",
"cider", 11, "breakfast"
)