Home > Blockchain >  How to join tibbles while merging non-matched columns into lists
How to join tibbles while merging non-matched columns into lists

Time:08-12

I'm looking for a way to perform a full join on 2 tibbles, by a column with unique indices, in a way that would preserve the original column names and merge (non-identical) values into a vector or list. The tibbles have the same column names.

Example input tibbles

> a
# A tibble: 3 × 3
     id name   location
  <dbl> <chr>  <chr>   
1     1 Caspar NL      
2     2 Monica USA     
3     3 Martin DE  
> b
# A tibble: 3 × 3
     id name    location
  <dbl> <chr>   <chr>   
1     1 Caspar  WWW     
2     2 Monique USA     
3     4 Francis FR         

Desired output

enter image description here or: enter image description here

The ability to handle more than just 2 tibbles at the same time would be ideal.

All I know is dyplr's full_join(), which doesn't give me the desired result:

> dplyr::full_join(a,b, by='id')
# A tibble: 4 × 5
     id name.x location.x name.y  location.y
  <dbl> <chr>  <chr>      <chr>   <chr>     
1     1 Caspar NL         Caspar  WWW       
2     2 Monica USA        Monique USA       
3     3 Martin DE         NA      NA        
4     4 NA     NA         Francis FR    

Reprex

a <- tibble::tribble(~id, ~name, ~location, 1, 'Caspar', 'NL', 2, 'Monica', 'USA', 3, 'Martin', 'DE')
b <- tibble::tribble(~id, ~name, ~location, 1, 'Caspar', 'WWW', 2, 'Monique', 'USA', 4, 'Francis', 'FR')

CodePudding user response:

It may be better with binding the rows first and then do a group by summarise

library(dplyr)
bind_rows(a, b) %>%
   group_by(id) %>%
    summarise(across(c('name', 'location'), list), .groups = 'drop')

-output

# A tibble: 4 × 3
     id name      location 
  <dbl> <list>    <list>   
1     1 <chr [2]> <chr [2]>
2     2 <chr [2]> <chr [2]>
3     3 <chr [1]> <chr [1]>
4     4 <chr [1]> <chr [1]>
  • Related