I have table 1 below:
A | B | C |
---|---|---|
1 | M | 12 |
2 | F | 13 |
3 | F | NA |
5 | M | 17 |
6 | F | NA |
7 | M | NA |
And table2:
A | B | C |
---|---|---|
3 | F | 11 |
6 | F | 19 |
7 | M | 15 |
I would like to replace the NA value in column C from table 1 with the value in column C from table 2 based on column A and column B in both tables 1 and 2.
How to write code in R?
CodePudding user response:
Once again rows_patch()
from dplyr
to the rescue. You will need at least dplyr
v.1.0.0.
# Read in data
table1 <- read.table(text = "A B C
1 M 12
2 F 13
3 F NA
5 M 17
6 F NA
7 M NA", h=T)
table2 <- read.table(text = "A B C
3 F 11
6 F 19
7 M 15", h = T)
# Replace NA values based on columns A and B
dplyr::rows_patch(table1, table2, by = c("A", "B"))
# A B C
# 1 1 M 12
# 2 2 F 13
# 3 3 F 11
# 4 5 M 17
# 5 6 F 19
# 6 7 M 15
From the docs:
rows_update() modifies existing rows (like UPDATE). Key values in y must be unique, and, by default, key values in y must exist in x.
rows_patch() works like rows_update() but only overwrites NA values.
CodePudding user response:
I was really surprised that I did not recognized rows_patch
before. Thank you for this @SamR.
Here is a 'classic' dplyr
with join and coalesce
:
library(dplyr)
table1 %>% left_join(table2, by= "A") %>%
mutate(C = coalesce(C.x, C.y)) %>%
select(A, B=B.x, C)
A B C
1 1 M 12
2 2 F 13
3 3 F 11
4 5 M 17
5 6 F 19
6 7 M 15
CodePudding user response:
data.table
option:
library(data.table)
setDT(table1)
setDT(table2)
table1[is.na(C), C := table2[.SD, on=.(A, B), x.C]]
table1
#> A B C
#> 1: 1 M 12
#> 2: 2 F 13
#> 3: 3 F 11
#> 4: 5 M 17
#> 5: 6 F 19
#> 6: 7 M 15
Created on 2022-07-11 by the reprex package (v2.0.1)