Home > Mobile >  merge by id but replace old data with new data in some columns
merge by id but replace old data with new data in some columns

Time:03-16

Basically, I have a dataframe that has some data that need to be replaced with data in a different dataframe. Some of the observations need to be changed in only a few of many columns.

To illustrate, say I have a dataframe:

df1 <- data.frame(index = c('a', 'b', 'c', 'd', 'e'), 
                  var1 = c(40, 22, 12, 4, 0),
                  var2 = c(75, 55, 65, 15, 0),
                  var3 = c(9, 18, 81, 3, 0),
                  var4 = c(1, 11, 21, 61, 0),
                  var5 = c(-1, -2, -3, -4, 0),
                  var6 = c(0, 1, 0, 1, 0))

    index   var1   var2   var3   var4   var5   var6
1     a     40     75      9      1     -1      0
2     b     22     55     18     11     -2      1
3     c     12     65     81     21     -3      0
4     d      4     15      3     61     -4      1
5.    e      0      0      0     0       0      0

Then, some better data for observations of "b" and "c" come along, but only for variables "var2" and "var3":

df2 <- data.frame(index = c('a', 'b', 'c', 'd'), 
                  var1 = c(40, 22, 12, 4),
                  var2 = c(75, 550, 650, 15),
                  var3 = c(9, 180, 810, 3),
                  var4 = c(1, 11, 21, 61))

  index var1 var2 var3 var4
1     a   40   75    9    1
2     b   22  550  180   11
3     c   12  650  810   21
4     d    4   15    3   61

I want the resulting data frame:


   index var1 var2 var3 var4 var5 var6
1     a   40   75    9    1   -1    0
2     b   22  550  180   11   -2    1
3     c   12  650  810   21   -3    0
4     d    4   15    3   61   -4    1
5     e    0    0    0    0    0    0

Any tips (particularly any tidyverse solutions to offfer?) I have played around with mutate, across, and replace but can't seem to get it.

Clarification: I'd like the new values for var2 and var3 to be replaced with those from df2 wherever the index values are found in df1. Everything else should stay the same. The dataset I am working on has the index values show up many times. I tried the following, but something is wrong with the syntax:

df1 %>%
    mutate(across(c(var2, var3),
           ~if_else(index %in% df2$index),
                     .[df2$index],
                     .))

CodePudding user response:

Update II: after new clarification: removed the previous answer:

library(dplyr)

bind_rows(df1, df2) %>% 
  arrange(index) %>% 
  mutate(across(var1:var4, ~ifelse(index=="b" |
                                     index == "c", lead(.), .))) %>% 
  na.omit()
  index var1 var2 var3 var4 var5 var6
1     a   40   75    9    1   -1    0
3     b   22  550  180   11   -2    1
5     c   12  650  810   21   -3    0
7     d    4   15    3   61   -4    1
9     e    0    0    0    0    0    0

CodePudding user response:

Based on the data you gave in the OP, and the fact that the 'better' observations are larger numbers, the following seems to give the desired result:

library(dplyr)

df1 <- data.frame(index = c('a', 'b', 'c', 'd', 'e'), 
                  var1 = c(40, 22, 12, 4, 0),
                  var2 = c(75, 55, 65, 15, 0),
                  var3 = c(9, 18, 81, 3, 0),
                  var4 = c(1, 11, 21, 61, 0),
                  var5 = c(-1, -2, -3, -4, 0),
                  var6 = c(0, 1, 0, 1, 0))


df2 <- data.frame(index = c('a', 'b', 'c', 'd'), 
                  var1 = c(40, 22, 12, 4),
                  var2 = c(75, 550, 650, 15),
                  var3 = c(9, 180, 810, 3),
                  var4 = c(1, 11, 21, 61))


get_best <- function(col){
  value <- max(unlist(col), na.rm=TRUE)
  return(value)
}

a <- bind_rows(df1,df2) %>% 
  group_by(index) %>% 
  mutate(across(.cols=everything(),
                .fns=get_best)) %>% 
  unique()

Output:

# A tibble: 5 × 7
# Groups:   index [5]
  index  var1  var2  var3  var4  var5  var6
  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a        40    75     9     1    -1     0
2 b        22   550   180    11    -2     1
3 c        12   650   810    21    -3     0
4 d         4    15     3    61    -4     1
5 e         0     0     0     0     0     0

  • Related