replace values across columns in a dataframe when index variable matches to another dataframe in r-CodePudding

I have a dataset (df1) with about 40 columns including an ID variable with values that can have multiple observations over the thousands of rows of observations. Say I have another dataset (df2) with only about 4 columns and a few rows of data. The column names in df2 are found in df1 and the ID variable matches some of the observations in df1. I want to replace values in df1 with those of df2 whenever the ID value from df1 matches that of df2.

Here is an example: (I am omitting all 40 cols for simplicity in df1)



df1 <-  data.frame(ID = c('a', 'b', 'a', 'd', 'e', 'd', 'f'), 
                   var1 = c(40, 22, 12, 4, 0, 2, 1),
                   var2 = c(75, 55, 65, 15, 0, 2, 1),
                   var3 = c(9, 18, 81, 3, 0, 2, 1),
                   var4 = c(1, 11, 21, 61, 0, 2, 1),
                   var5 = c(-1, -2, -3, -4, 0, 2, 1),
                   var6 = c(0, 1, 0, 1, 0, 2, 1))

df2<- data.frame(ID = c('a', 'd', 'f'), 
                 var2 = c("fish", "pig", "cow"),
                 var4 = c("pencil", "pen", "eraser"),
                 var5 = c("lamp", "rug", "couch"))

I would like the resulting df:

   ID var1  var2  var3 var4    var5  var6
1  a   40   fish  9    pencil  lamp   0
2  b   22   55    18   11      -2     1
3  a   12   fish  81   pencil  lamp   0
4  d    4   pig   3    pen     rug    1
5  e    0   0     0    0        0     0
6  d    2   pig   2    pen      rug   2
7  f    1   cow   1    eraser   couch 1

I think there is a tidyverse solution using mutate across and case_when but I cannot figure out how to do this. Any help would be appreciated.

CodePudding user response：

An option is also to loop across the column names from 'df2' in df1, match the 'ID' and coalesce with the original column values

library(dplyr)
df1 %>% 
   mutate(across(any_of(names(df2)[-1]),
     ~ coalesce(df2[[cur_column()]][match(ID, df2$ID)], as.character(.x))))

-output

  ID var1 var2 var3   var4  var5 var6
1  a   40 fish    9 pencil  lamp    0
2  b   22   55   18     11    -2    1
3  a   12 fish   81 pencil  lamp    0
4  d    4  pig    3    pen   rug    1
5  e    0    0    0      0     0    0
6  d    2  pig    2    pen   rug    2
7  f    1  cow    1 eraser couch    1

CodePudding user response：

library(tidyverse)
df1 %>% 
  mutate(row = row_number(), .before = 1) %>%       # add row number
  pivot_longer(-c(ID, row)) %>%                     # reshape long
  mutate(value = as.character(value)) %>%           # numbers as text like df2
  left_join(df2 %>%                                 # join to long version of df2
    pivot_longer(-ID), by = c("ID", "name")
  ) %>%
  mutate(new_val = coalesce(value.y, value.x)) %>%  # preferentially use df2 val
  select(-value.x, -value.y) %>%
  pivot_wider(names_from = name, values_from = new_val)  # reshape wide again

Result

# A tibble: 7 × 8
    row ID    var1  var2  var3  var4   var5  var6 
  <int> <chr> <chr> <chr> <chr> <chr>  <chr> <chr>
1     1 a     40    fish  9     pencil lamp  0    
2     2 b     22    55    18    11     -2    1    
3     3 a     12    fish  81    pencil lamp  0    
4     4 d     4     pig   3     pen    rug   1    
5     5 e     0     0     0     0      0     0    
6     6 d     2     pig   2     pen    rug   2    
7     7 f     1     cow   1     eraser couch 1