Home > database >  replace values across columns in a dataframe when index variable matches to another dataframe in r
replace values across columns in a dataframe when index variable matches to another dataframe in r

Time:03-17

I have a dataset (df1) with about 40 columns including an ID variable with values that can have multiple observations over the thousands of rows of observations. Say I have another dataset (df2) with only about 4 columns and a few rows of data. The column names in df2 are found in df1 and the ID variable matches some of the observations in df1. I want to replace values in df1 with those of df2 whenever the ID value from df1 matches that of df2.

Here is an example: (I am omitting all 40 cols for simplicity in df1)



df1 <-  data.frame(ID = c('a', 'b', 'a', 'd', 'e', 'd', 'f'), 
                   var1 = c(40, 22, 12, 4, 0, 2, 1),
                   var2 = c(75, 55, 65, 15, 0, 2, 1),
                   var3 = c(9, 18, 81, 3, 0, 2, 1),
                   var4 = c(1, 11, 21, 61, 0, 2, 1),
                   var5 = c(-1, -2, -3, -4, 0, 2, 1),
                   var6 = c(0, 1, 0, 1, 0, 2, 1))

df2<- data.frame(ID = c('a', 'd', 'f'), 
                 var2 = c("fish", "pig", "cow"),
                 var4 = c("pencil", "pen", "eraser"),
                 var5 = c("lamp", "rug", "couch"))

I would like the resulting df:

   ID var1  var2  var3 var4    var5  var6
1  a   40   fish  9    pencil  lamp   0
2  b   22   55    18   11      -2     1
3  a   12   fish  81   pencil  lamp   0
4  d    4   pig   3    pen     rug    1
5  e    0   0     0    0        0     0
6  d    2   pig   2    pen      rug   2
7  f    1   cow   1    eraser   couch 1

I think there is a tidyverse solution using mutate across and case_when but I cannot figure out how to do this. Any help would be appreciated.

CodePudding user response:

An option is also to loop across the column names from 'df2' in df1, match the 'ID' and coalesce with the original column values

library(dplyr)
df1 %>% 
   mutate(across(any_of(names(df2)[-1]),
     ~ coalesce(df2[[cur_column()]][match(ID, df2$ID)], as.character(.x))))

-output

  ID var1 var2 var3   var4  var5 var6
1  a   40 fish    9 pencil  lamp    0
2  b   22   55   18     11    -2    1
3  a   12 fish   81 pencil  lamp    0
4  d    4  pig    3    pen   rug    1
5  e    0    0    0      0     0    0
6  d    2  pig    2    pen   rug    2
7  f    1  cow    1 eraser couch    1

CodePudding user response:

library(tidyverse)
df1 %>% 
  mutate(row = row_number(), .before = 1) %>%       # add row number
  pivot_longer(-c(ID, row)) %>%                     # reshape long
  mutate(value = as.character(value)) %>%           # numbers as text like df2
  left_join(df2 %>%                                 # join to long version of df2
    pivot_longer(-ID), by = c("ID", "name")
  ) %>%
  mutate(new_val = coalesce(value.y, value.x)) %>%  # preferentially use df2 val
  select(-value.x, -value.y) %>%
  pivot_wider(names_from = name, values_from = new_val)  # reshape wide again

Result

# A tibble: 7 × 8
    row ID    var1  var2  var3  var4   var5  var6 
  <int> <chr> <chr> <chr> <chr> <chr>  <chr> <chr>
1     1 a     40    fish  9     pencil lamp  0    
2     2 b     22    55    18    11     -2    1    
3     3 a     12    fish  81    pencil lamp  0    
4     4 d     4     pig   3     pen    rug   1    
5     5 e     0     0     0     0      0     0    
6     6 d     2     pig   2     pen    rug   2    
7     7 f     1     cow   1     eraser couch 1 
  • Related