Home > Software design >  Replace values in a dataframe by values of other dataframe
Replace values in a dataframe by values of other dataframe

Time:02-08

I have two dataframes, one old and one new. Both dataframes contain the same IDs but the values in the columns can differ amongst the dataframes. The old dataframe contains the 'correct' values and therefor need to be replaced in the new information.

This is the old dataframe:

structure(list(ID = c(1, 2, 3, 4, 5, 6, 7), a = c("hi", "ho", 
NA, "hu", "ha", NA, "he"), b = c(1, 1, NA, 1, 1, NA, 1), c = c("ri", 
"ro", NA, "ru", NA, NA, "re"), d = c(2, 2, NA, 2, NA, NA, 2)), row.names = c(NA, 
-7L), class = c("tbl_df", "tbl", "data.frame"))

 ID a         b c         d

 1 hi        1 ri        2
 2 ho        1 ro        2
 3 NA       NA NA       NA
 4 hu        1 ru        2
 5 ha        1 NA       NA
 6 NA       NA NA       NA
 7 he        1 re        2

And this the new dataframe:

structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 10, 11), a = c("hi", 
"ho", NA, "hu", "ha", NA, "he", "hii", "hoo"), b = c(3, 1, NA, 
1, 1, NA, 1, 1, 1), c = c("ri", "ro", NA, "ru", "ra", NA, "re", 
NA, "roo"), d = c(3, 2, NA, 2, 2, NA, 2, NA, 2)), row.names = c(NA, 
-9L), class = c("tbl_df", "tbl", "data.frame"))

  ID a         b c         d
     1 hi        3 ri        3
     2 ho        1 ro        2
     3 NA       NA NA       NA
     4 hu        1 ru        2
     5 ha        1 ra        2
     6 NA       NA NA       NA
     7 he        1 re        2
    10 hii       1 NA       NA
    11 hoo       1 roo       2

The IDs from the old df are included in the new df but in the new df there are also new IDs. These need to be included in the final output. The values that are different in the new df need to be replaced by the values in the old df. Also, new data can be added in the columns (ID number 5 has new data in columns c and d). These also need to be included in the final output.

The final output should look like:

structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 10, 11), a = c("hi", 
"ho", NA, "hu", "ha", NA, "he", "hii", "hoo"), b = c(1, 1, NA, 
1, 1, NA, 1, 1, 1), c = c("ri", "ro", NA, "ru", "ra", NA, "re", 
NA, "roo"), d = c(2, 2, NA, 2, 2, NA, 2, NA, 2)), row.names = c(NA, 
-9L), class = c("tbl_df", "tbl", "data.frame"))



  ID a         b c         d
     1 hi        1 ri        2
     2 ho        1 ro        2
     3 NA       NA NA       NA
     4 hu        1 ru        2
     5 ha        1 ra        2
     6 NA       NA NA       NA
     7 he        1 re        2
    10 hii       1 NA       NA
    11 hoo       1 roo       2

Can anyone help achieve this? I've seen the function match but this seems to only work for certain columns.

Thanks a lot!

CodePudding user response:

Call first sample df old_df, call second new_df. It sounds like essentially you want to update rows in new_df with values from old_df, retaining all non-matching rows in new_df:

library(dplyr)
new_df %>% rows_update(old_df, by = "ID")

Gives:

# A tibble: 9 x 5
     ID a         b c         d
  <dbl> <chr> <dbl> <chr> <dbl>
1     1 hi        1 ri        2
2     2 ho        1 ro        2
3     3 NA       NA NA       NA
4     4 hu        1 ru        2
5     5 ha        1 NA       NA
6     6 NA       NA NA       NA
7     7 he        1 re        2
8    10 hii       1 NA       NA
9    11 hoo       1 roo       2
  •  Tags:  
  • Related