I have a data frame (DF1) that I want to fill with values from another data frame (DF2) if the columns names match. If there's NA, or 0 I don't want to change DF1 columns but if theres 1 or 2, I'd like to modify with DF2 corresponding value.
Example: df1
sampleID | ID1 | ID2 | ID3 |
---|---|---|---|
A | 0 | NA | 0 |
B | 1 | 0 | 1 |
C | 1 | 2 | 0 |
DF2
V1 | V2 |
---|---|
ID1 | 0.5 |
ID2 | 0.7 |
ID3 | 0.9 |
Desired result:
sampleID | ID1 | ID2 | ID3 |
---|---|---|---|
A | 0 | NA | 0 |
B | 0.5 | 0 | 0.9 |
C | 0.5 | 0.7 | 0 |
If there's a NA value, I don't want any change. I've worked on an apply() custom function based solution, but it's not working too well. Interested by any idea you might have. Also dplyr solutions are welcome!
CodePudding user response:
In base R, you can use sapply
replace
. Note that this works because IDs in df1
and df2
are in the same order.
df1[-1] <- sapply(2:ncol(df1),
function(x) replace(df1[x], df1[x] != 0 & !is.na(df1[x]), df2[x-1, 2]))
# sampleID ID1 ID2 ID3
# 1 A 0.0 NA 0.0
# 2 B 0.5 0.0 0.9
# 3 C 0.5 0.7 0.0
Or an equivalent solution with match
(this one does not rely on ordering):
sapply(2:ncol(df1),
function(x) replace(df1[x],
df1[x] != 0 & !is.na(df1[x]),
df2$V2[match(names(df1[x]), df2$V1)]))
CodePudding user response:
# Create a named vector
lut = setNames(df2$V2, df2$V1)
cols = intersect(names(lut), names(df1))
# [1] "ID1" "ID2" "ID3"
df1[cols] <-
lapply(cols, \(x) replace(df1[[x]], !(is.na(df1[[x]]) | df1[[x]] == 0), lut[x]))
# sampleID ID1 ID2 ID3
# 1 A 0.0 NA 0.0
# 2 B 0.5 0.0 0.9
# 3 C 0.5 0.7 0.0
Data
df1 <- data.frame(sampleID = c("A", "B", "C"), ID1 = c(0L, 1L, 1L ), ID2 = c(NA, 0L, 2L), ID3 = c(0L, 1L, 0L))
df2 <- data.frame(V1 = c("ID1", "ID2", "ID3"), V2 = c(0.5, 0.7, 0.9))
CodePudding user response:
One way to solve your problem:
df3 = df1
df3[df2$V1] = with(df2, Map(\(x,y) ifelse(x %in% c(0,NA), x, V2[V1==y]), df1[V1], V1))
sampleID ID1 ID2 ID3
1 A 0.5 NA 0.9
2 B 1.0 0.7 1.0
3 C 1.0 0.7 0.9