replace NA in a table with values in column with another table by conditions in R-CodePudding

I have table 1 below:

A	B	C
1	M	12
2	F	13
3	F	NA
5	M	17
6	F	NA
7	M	NA

And table2:

A	B	C
3	F	11
6	F	19
7	M	15

I would like to replace the NA value in column C from table 1 with the value in column C from table 2 based on column A and column B in both tables 1 and 2.

How to write code in R?

CodePudding user response：

Once again rows_patch() from dplyr to the rescue. You will need at least dplyr v.1.0.0.

# Read in data
table1  <- read.table(text = "A B   C
1   M   12
2   F   13
3   F   NA
5   M   17
6   F   NA
7   M   NA", h=T)

table2  <- read.table(text = "A B   C
3   F   11
6   F   19
7   M   15", h = T)


# Replace NA values based on columns A and B
dplyr::rows_patch(table1, table2, by = c("A", "B"))

#   A B  C
# 1 1 M 12
# 2 2 F 13
# 3 3 F 11
# 4 5 M 17
# 5 6 F 19
# 6 7 M 15

From the docs:

rows_update() modifies existing rows (like UPDATE). Key values in y must be unique, and, by default, key values in y must exist in x.

rows_patch() works like rows_update() but only overwrites NA values.

CodePudding user response：

I was really surprised that I did not recognized rows_patch before. Thank you for this @SamR.

Here is a 'classic' dplyr with join and coalesce:

library(dplyr)

table1 %>% left_join(table2, by= "A") %>%
  mutate(C = coalesce(C.x, C.y)) %>% 
  select(A, B=B.x, C)

CodePudding user response：

data.table option:

library(data.table)
setDT(table1) 
setDT(table2)
table1[is.na(C), C := table2[.SD, on=.(A, B), x.C]]
table1
#>    A B  C
#> 1: 1 M 12
#> 2: 2 F 13
#> 3: 3 F 11
#> 4: 5 M 17
#> 5: 6 F 19
#> 6: 7 M 15

^{Created on 2022-07-11 by the reprex package (v2.0.1)}