Home > OS >  Merge overlapping datasets by column identifier?
Merge overlapping datasets by column identifier?

Time:12-29

I am trying to merge/join two datasets which have different data about the same samples with no rows in common. I would like to be able to merge them by the column names and have that add the rows from the smaller dataset to the larger, filling in NA for all columns that do not have information from the smaller dataset. I feel like this is something super easy that I'm just somehow not able to figure out.

2 tiny sample datasets:

df1 <- data.frame(team=c('A', 'B', 'C', 'D'),
                  points=c(88, 98, 104, 100),
                  league=c('Alpha', 'Beta', 'Gamma', 'Delta'))

  team points league
1    A     88  Alpha
2    B     98   Beta
3    C    104  Gamma
4    D    100  Delta

df2 <- data.frame(team=c('L', 'M','N', 'O', 'P', 'Q'),
                  points=c(43, 66, 77, 83, 12, 12),
                  league=c('Epsilon', 'Zeta', 'Eta', 'Theta', 'Iota', 'Kappa'),
                  rebounds=c(22, 31, 29, 20, 33, 44),
                  fouls=c(1, 3, 2, 4, 5, 1))

  team points  league rebounds fouls
1    L     43 Epsilon       22     1
2    M     66    Zeta       31     3
3    N     77     Eta       29     2
4    O     83   Theta       20     4
5    P     12    Iota       33     5
6    Q     12   Kappa       44     1

the output I would like to get would be:

df3<- data.frame(team=c('A', 'B', 'C', 'D', 'L', 'M','N', 'O', 'P', 'Q' ),
                 points=c(88, 98, 104, 100, 43, 66, 77, 83, 12, 12),
                 league=c('Alpha', 'Beta', 'Gamma', 'Delta', 'Epsilon', 'Zeta', 'Eta', 'Theta', 'Iota', 'Kappa'),
                 rebounds=c('NA', 'NA', 'NA', 'NA', 22, 31, 29, 20, 33, 44),
                 fouls= c('NA', 'NA', 'NA', 'NA',1, 3, 2, 4, 5, 1))

   team points  league rebounds fouls
1     A     88   Alpha       NA    NA
2     B     98    Beta       NA    NA
3     C    104   Gamma       NA    NA
4     D    100   Delta       NA    NA
5     L     43 Epsilon       22     1
6     M     66    Zeta       31     3
7     N     77     Eta       29     2
8     O     83   Theta       20     4
9     P     12    Iota       33     5
10    Q     12   Kappa       44     1

I tried transposing the dfs, but because they have no rows in common that does not work either. I thought about making an index, but I'm just learning about those and I'm not sure how I would do it or if that's the correct move.

CodePudding user response:

Use full_join and arrange

library(dplyr)
full_join(df2, df1) %>% 
    arrange(team)

-output

 team points  league rebounds fouls
1     A     88   Alpha       NA    NA
2     B     98    Beta       NA    NA
3     C    104   Gamma       NA    NA
4     D    100   Delta       NA    NA
5     L     43 Epsilon       22     1
6     M     66    Zeta       31     3
7     N     77     Eta       29     2
8     O     83   Theta       20     4
9     P     12    Iota       33     5
10    Q     12   Kappa       44     1

Or with rows_upsert

rows_upsert(df2, df1, by = c("team", "points", "league"))

CodePudding user response:

We could use bind_rows()

When row-binding, columns are matched by name, and any missing columns will be filled with NA:

library(dplyr)

bind_rows(df1, df2)
   team points  league rebounds fouls
1     A     88   Alpha       NA    NA
2     B     98    Beta       NA    NA
3     C    104   Gamma       NA    NA
4     D    100   Delta       NA    NA
5     L     43 Epsilon       22     1
6     M     66    Zeta       31     3
7     N     77     Eta       29     2
8     O     83   Theta       20     4
9     P     12    Iota       33     5
10    Q     12   Kappa       44     1

CodePudding user response:

Using base R, you could add the missing columns in df1 using setdiff() and then rbind them together:

df1[, setdiff(names(df2), names(df1))] <- NA

rbind(df1, df2)

Output:

#    team points  league rebounds fouls
# 1     A     88   Alpha       NA    NA
# 2     B     98    Beta       NA    NA
# 3     C    104   Gamma       NA    NA
# 4     D    100   Delta       NA    NA
# 5     L     43 Epsilon       22     1
# 6     M     66    Zeta       31     3
# 7     N     77     Eta       29     2
# 8     O     83   Theta       20     4
# 9     P     12    Iota       33     5
# 10    Q     12   Kappa       44     1
  • Related