merge two uneven dataframes by ID and fill in missing values-CodePudding

I'm new to r. This is my first forum question... I'm trying to merge two datasets like these:

df1 <- data.frame(ID = letters[1:5],
                  x = 5:9,
                  y = c(NA,6,5,NA,NA))

> df1
  ID x  y
1  a 5 NA
2  b 6  6
3  c 7  5
4  d 8 NA
5  e 9 NA


df2 <- data.frame(ID = letters[4:8],
                  y = 7:11,
                  z = c(4,3,NA,2,1))

> df2
  ID  y  z
1  d  7  4
2  e  8  3
3  f  9 NA
4  g 10  2
5  h 11  1

outcome should look like this:

> df3
  ID  x  y  z
1  a  5 NA NA
2  b  6  6 NA
3  c  7  5 NA
4  d  8  7  4
5  e  9  8  3
6  f NA  9 NA
7  g NA 10  2
8  h NA 11  1

I looked around in various forums but couldn't find the solution to this specific problem :/

Any suggestions are much appreciated!

CodePudding user response：

You can first do a full_join, then use coalesce to replace NA with the non-missing one.

The final relocate function is just to order the column according to your desired format.

library(tidyverse)

full_join(df1, df2, by = "ID") %>% 
  mutate(y = coalesce(y.x, y.y), .keep = "unused") %>% 
  relocate(y, .after = x)

  ID  x  y  z
1  a  5 NA NA
2  b  6  6 NA
3  c  7  5 NA
4  d  8  7  4
5  e  9  8  3
6  f NA  9 NA
7  g NA 10  2
8  h NA 11  1

CodePudding user response：

We can use {powerjoin} :

library(powerjoin)

power_full_join(df1, df2, by = "ID", conflict = coalesce_xy)
#>   ID  x  z  y
#> 1  a  5 NA NA
#> 2  b  6 NA  6
#> 3  c  7 NA  5
#> 4  d  8  4  7
#> 5  e  9  3  8
#> 6  f NA NA  9
#> 7  g NA  2 10
#> 8  h NA  1 11

^{Created on 2022-04-14 by the reprex package (v2.0.1)}

CodePudding user response：

data.table approach

library(data.table)
DT <- rbindlist(list(df1, df2), use.names = TRUE, fill = TRUE)
final <- DT[, lapply(.SD, max, na.rm = TRUE),keyby = ID]
# replace infinite values with NA
for (j in 1:ncol(final)) set(final, which(is.infinite(final[[j]])), j, NA)
#    ID  x  y  z
# 1:  a  5 NA NA
# 2:  b  6  6 NA
# 3:  c  7  5 NA
# 4:  d  8  7  4
# 5:  e  9  8  3
# 6:  f NA  9 NA
# 7:  g NA 10  2
# 8:  h NA 11  1

CodePudding user response：

A base R option using merge aggregate

aggregate(
    . ~ ID,
    merge(df1, df2, all = TRUE),
    function(x) ifelse(all(is.na(x)), NA, na.omit(x)),
    na.action = na.pass
)

gives

  ID  y  x  z
1  a NA  5 NA
2  b  6  6 NA
3  c  5  7 NA
4  d  7  8  4
5  e  8  9  3
6  f  9 NA NA
7  g 10 NA  2
8  h 11 NA  1