I am trying to merge/join two datasets which have different data about the same samples with no rows in common. I would like to be able to merge them by the column names and have that add the rows from the smaller dataset to the larger, filling in NA for all columns that do not have information from the smaller dataset. I feel like this is something super easy that I'm just somehow not able to figure out.
2 tiny sample datasets:
df1 <- data.frame(team=c('A', 'B', 'C', 'D'),
points=c(88, 98, 104, 100),
league=c('Alpha', 'Beta', 'Gamma', 'Delta'))
team points league
1 A 88 Alpha
2 B 98 Beta
3 C 104 Gamma
4 D 100 Delta
df2 <- data.frame(team=c('L', 'M','N', 'O', 'P', 'Q'),
points=c(43, 66, 77, 83, 12, 12),
league=c('Epsilon', 'Zeta', 'Eta', 'Theta', 'Iota', 'Kappa'),
rebounds=c(22, 31, 29, 20, 33, 44),
fouls=c(1, 3, 2, 4, 5, 1))
team points league rebounds fouls
1 L 43 Epsilon 22 1
2 M 66 Zeta 31 3
3 N 77 Eta 29 2
4 O 83 Theta 20 4
5 P 12 Iota 33 5
6 Q 12 Kappa 44 1
the output I would like to get would be:
df3<- data.frame(team=c('A', 'B', 'C', 'D', 'L', 'M','N', 'O', 'P', 'Q' ),
points=c(88, 98, 104, 100, 43, 66, 77, 83, 12, 12),
league=c('Alpha', 'Beta', 'Gamma', 'Delta', 'Epsilon', 'Zeta', 'Eta', 'Theta', 'Iota', 'Kappa'),
rebounds=c('NA', 'NA', 'NA', 'NA', 22, 31, 29, 20, 33, 44),
fouls= c('NA', 'NA', 'NA', 'NA',1, 3, 2, 4, 5, 1))
team points league rebounds fouls
1 A 88 Alpha NA NA
2 B 98 Beta NA NA
3 C 104 Gamma NA NA
4 D 100 Delta NA NA
5 L 43 Epsilon 22 1
6 M 66 Zeta 31 3
7 N 77 Eta 29 2
8 O 83 Theta 20 4
9 P 12 Iota 33 5
10 Q 12 Kappa 44 1
I tried transposing the dfs, but because they have no rows in common that does not work either. I thought about making an index, but I'm just learning about those and I'm not sure how I would do it or if that's the correct move.
CodePudding user response:
Use full_join
and arrange
library(dplyr)
full_join(df2, df1) %>%
arrange(team)
-output
team points league rebounds fouls
1 A 88 Alpha NA NA
2 B 98 Beta NA NA
3 C 104 Gamma NA NA
4 D 100 Delta NA NA
5 L 43 Epsilon 22 1
6 M 66 Zeta 31 3
7 N 77 Eta 29 2
8 O 83 Theta 20 4
9 P 12 Iota 33 5
10 Q 12 Kappa 44 1
Or with rows_upsert
rows_upsert(df2, df1, by = c("team", "points", "league"))
CodePudding user response:
We could use bind_rows()
When row-binding, columns are matched by name, and any missing columns will be filled with NA:
library(dplyr)
bind_rows(df1, df2)
team points league rebounds fouls
1 A 88 Alpha NA NA
2 B 98 Beta NA NA
3 C 104 Gamma NA NA
4 D 100 Delta NA NA
5 L 43 Epsilon 22 1
6 M 66 Zeta 31 3
7 N 77 Eta 29 2
8 O 83 Theta 20 4
9 P 12 Iota 33 5
10 Q 12 Kappa 44 1
CodePudding user response:
Using base R, you could add the missing columns in df1
using setdiff()
and then rbind
them together:
df1[, setdiff(names(df2), names(df1))] <- NA
rbind(df1, df2)
Output:
# team points league rebounds fouls
# 1 A 88 Alpha NA NA
# 2 B 98 Beta NA NA
# 3 C 104 Gamma NA NA
# 4 D 100 Delta NA NA
# 5 L 43 Epsilon 22 1
# 6 M 66 Zeta 31 3
# 7 N 77 Eta 29 2
# 8 O 83 Theta 20 4
# 9 P 12 Iota 33 5
# 10 Q 12 Kappa 44 1