Home > front end >  R: update the values in df1 based on data in df2
R: update the values in df1 based on data in df2

Time:07-06

Hi I have two data frames (df1 and df2) with two shared variables (ID and Yr). I want to update the values in a third variable (value) in df1 with the new data in the respective value in df2. But below code does not update the value in df1, it seems the values are not passed to the corresponding cels in df1.

df1 = data.frame(ID = c("a","b","c","d","e") ,
                 Yr = c(2000,2001,2002,2003,2004), 
                 value= c(100,100,100,100, 100))
df2 = data.frame(ID = c("a","b","c") ,
                 Yr = c(2000,2001,2002), 
                 valuenew= c(200,150,120))


for (i in 1:nrow(df2)){
  id <- df2[i,'ID']
  year <- df2[i, 'Yr']
  valuenew<- df2[i, 'valuenew']
  df1[which (df1$ID == id & df1$Yr == year), 'value'] <- valuenew
}

the desired result

   ID  Yr   value
    a  2000   200
    b  2001   150
    c  2002   120
    d  2003   100
    e  2004   100

CodePudding user response:

Since dplyr version 1.0.0, you can use rows_update for this:

dplyr::rows_update(
    df1, 
    rename(df2, value=valuenew), 
    by = c("ID", "Yr")
)
#   ID   Yr value
# 1  a 2000   200
# 2  b 2001   150
# 3  c 2002   120
# 4  d 2003   100
# 5  e 2004   100

CodePudding user response:

We could use a join for this: For example left_join

library(dplyr)

left_join(df1, df2, by="ID") %>% 
  mutate(value = ifelse(!is.na(valuenew), valuenew, value)) %>% 
  select(ID, Yr=Yr.x, value)
 ID   Yr value
1  a 2000   200
2  b 2001   150
3  c 2002   120
4  d 2003   100
5  e 2004   100

CodePudding user response:

Option using data.table:

df1 = data.frame(ID = c("a","b","c","d","e") ,
                 Yr = c(2000,2001,2002,2003,2004), 
                 value= c(100,100,100,100, 100))
df2 = data.frame(ID = c("a","b","c") ,
                 Yr = c(2000,2001,2002), 
                 valuenew= c(200,150,120))

library(data.table)
setDT(df1)[df2, value := i.valuenew, on = .(ID, Yr)]
df1
#>    ID   Yr value
#> 1:  a 2000   200
#> 2:  b 2001   150
#> 3:  c 2002   120
#> 4:  d 2003   100
#> 5:  e 2004   100

Created on 2022-07-05 by the reprex package (v2.0.1)

CodePudding user response:

You can try this for loop

for(i in 1:nrow(df1)){
    y <- which(df1$Yr[i] == df2$Yr)
    if(length(y) > 0) df1$value[i] <- df2$valuenew[y]
}
  • Output
  ID   Yr value
1  a 2000   200
2  b 2001   150
3  c 2002   120
4  d 2003   100
5  e 2004   100

CodePudding user response:

Your example is working and updating df1 just fine.

However, to add one more solution, you can try the lines below without using a for loop or attaching extra packages:

key <- paste(df1$ID, df1$Yr)
values <- setNames(df2$value, paste(df2$ID, df2$Yr))[key]
df1$value[!is.na(values)] <- values[!is.na(values)]

Maybe something worth to mention in general for your problem, make sure you don't have any duplicated ID/Yr combinations in df2...

  •  Tags:  
  • r
  • Related