Hi I have two data frames (df1 and df2) with two shared variables (ID and Yr). I want to update the values in a third variable (value) in df1 with the new data in the respective value in df2. But below code does not update the value in df1, it seems the values are not passed to the corresponding cels in df1.
df1 = data.frame(ID = c("a","b","c","d","e") ,
Yr = c(2000,2001,2002,2003,2004),
value= c(100,100,100,100, 100))
df2 = data.frame(ID = c("a","b","c") ,
Yr = c(2000,2001,2002),
valuenew= c(200,150,120))
for (i in 1:nrow(df2)){
id <- df2[i,'ID']
year <- df2[i, 'Yr']
valuenew<- df2[i, 'valuenew']
df1[which (df1$ID == id & df1$Yr == year), 'value'] <- valuenew
}
the desired result
ID Yr value
a 2000 200
b 2001 150
c 2002 120
d 2003 100
e 2004 100
CodePudding user response:
Since dplyr
version 1.0.0, you can use rows_update
for this:
dplyr::rows_update(
df1,
rename(df2, value=valuenew),
by = c("ID", "Yr")
)
# ID Yr value
# 1 a 2000 200
# 2 b 2001 150
# 3 c 2002 120
# 4 d 2003 100
# 5 e 2004 100
CodePudding user response:
We could use a join for this: For example left_join
library(dplyr)
left_join(df1, df2, by="ID") %>%
mutate(value = ifelse(!is.na(valuenew), valuenew, value)) %>%
select(ID, Yr=Yr.x, value)
ID Yr value
1 a 2000 200
2 b 2001 150
3 c 2002 120
4 d 2003 100
5 e 2004 100
CodePudding user response:
Option using data.table
:
df1 = data.frame(ID = c("a","b","c","d","e") ,
Yr = c(2000,2001,2002,2003,2004),
value= c(100,100,100,100, 100))
df2 = data.frame(ID = c("a","b","c") ,
Yr = c(2000,2001,2002),
valuenew= c(200,150,120))
library(data.table)
setDT(df1)[df2, value := i.valuenew, on = .(ID, Yr)]
df1
#> ID Yr value
#> 1: a 2000 200
#> 2: b 2001 150
#> 3: c 2002 120
#> 4: d 2003 100
#> 5: e 2004 100
Created on 2022-07-05 by the reprex package (v2.0.1)
CodePudding user response:
You can try this for
loop
for(i in 1:nrow(df1)){
y <- which(df1$Yr[i] == df2$Yr)
if(length(y) > 0) df1$value[i] <- df2$valuenew[y]
}
- Output
ID Yr value
1 a 2000 200
2 b 2001 150
3 c 2002 120
4 d 2003 100
5 e 2004 100
CodePudding user response:
Your example is working and updating df1
just fine.
However, to add one more solution, you can try the lines below without using a for loop or attaching extra packages:
key <- paste(df1$ID, df1$Yr)
values <- setNames(df2$value, paste(df2$ID, df2$Yr))[key]
df1$value[!is.na(values)] <- values[!is.na(values)]
Maybe something worth to mention in general for your problem, make sure you don't have any duplicated ID/Yr combinations in df2
...