Home > Back-end >  How do I pivot and join 3 different tables by a ID column using dplyr?
How do I pivot and join 3 different tables by a ID column using dplyr?

Time:12-23

Say I want to join these 3 dataframes with dplyr. How do I do it? I know I should use some combination of pivots and joins, but I can't figure out how to get it right.

My goal is to have the df something like this:

mpg_deciles mean_mpg mean_price production coefficient
1           13.5     12990      Foreign    12990
2           16       10874      Domestic   10874.8571428572

Heres the data

library(dplyr)

a <- tibble::tribble(
  ~mpg_deciles,        ~mean_mpg,
  1L,             13.5,
  2L,               16,
  3L,            17.75,
  4L,           18.625,
  5L, 19.7142857142857)


 b <- tibble::tribble(
    ~coeff_foreign, ~mpg_deciles,  ~mean_p_foreign,  ~foreign,
    12990,            2,            12990, "Foreign",
    -2147.49999999997,            3,          10842.5, "Foreign",
    -7180.99999999996,            4, 5809.00000000003, "Foreign",
    -6777.49999999999,            6,           6212.5, "Foreign",
    -6435.3333333333,            7, 6554.66666666669, "Foreign")
  
  
c <- tibble::tribble(
    ~coeff_domestic, ~mpg_deciles, ~mean_p_domestic,   ~foreign,
    10874.8571428572,           1L, 10874.8571428572, "Domestic",
    -3697.73214285716,           2L,         7177.125, "Domestic",
    -6031.19047619049,           3L, 4843.66666666666, "Domestic",
    -6365.35714285716,           4L,           4509.5, "Domestic",
    -4650.42857142859,           5L, 6224.42857142857, "Domestic")

CodePudding user response:

We could use a nested left_join:

library(dplyr)

library(dplyr)
left_join(a, b, by='mpg_deciles') %>%
  left_join(., c, by='mpg_deciles') 
  mpg_deciles mean_mpg coeff_foreign mean_p_foreign foreign.x coeff_domestic mean_p_domestic foreign.y
        <dbl>    <dbl>         <dbl>          <dbl> <chr>              <dbl>           <dbl> <chr>    
1           1     13.5           NA             NA  NA                10875.          10875. Domestic 
2           2     16          12990          12990  Foreign           -3698.           7177. Domestic 
3           3     17.8        -2147.         10842. Foreign           -6031.           4844. Domestic 
4           4     18.6        -7181.          5809. Foreign           -6365.           4510. Domestic 
5           5     19.7           NA             NA  NA                -4650.           6224. Domestic 

CodePudding user response:

I think you need to pre-process b and c and then use a left_join:

library(dplyr)

a %>% 
  left_join(
    bind_rows(
      b %>% 
        rename(coefficient = coeff_foreign, mean_price = mean_p_foreign, production = foreign),
      c %>%     
        rename(coefficient = coeff_domestic, mean_price = mean_p_domestic, production = foreign)
      ),
    by = "mpg_deciles"
  )

This returns

# A tibble: 8 x 5
  mpg_deciles mean_mpg coefficient mean_price production
        <dbl>    <dbl>       <dbl>      <dbl> <chr>     
1           1     13.5      10875.     10875. Domestic  
2           2     16        12990      12990  Foreign   
3           2     16        -3698.      7177. Domestic  
4           3     17.8      -2147.     10842. Foreign   
5           3     17.8      -6031.      4844. Domestic  
6           4     18.6      -7181.      5809. Foreign   
7           4     18.6      -6365.      4510. Domestic  
8           5     19.7      -4650.      6224. Domestic  
  • Related