I have a dataset (df1) with about 40 columns including an ID variable with values that can have multiple observations over the thousands of rows of observations. Say I have another dataset (df2) with only about 4 columns and a few rows of data. The column names in df2 are found in df1 and the ID variable matches some of the observations in df1. I want to replace values in df1 with those of df2 whenever the ID value from df1 matches that of df2.
Here is an example: (I am omitting all 40 cols for simplicity in df1)
df1 <- data.frame(ID = c('a', 'b', 'a', 'd', 'e', 'd', 'f'),
var1 = c(40, 22, 12, 4, 0, 2, 1),
var2 = c(75, 55, 65, 15, 0, 2, 1),
var3 = c(9, 18, 81, 3, 0, 2, 1),
var4 = c(1, 11, 21, 61, 0, 2, 1),
var5 = c(-1, -2, -3, -4, 0, 2, 1),
var6 = c(0, 1, 0, 1, 0, 2, 1))
df2<- data.frame(ID = c('a', 'd', 'f'),
var2 = c("fish", "pig", "cow"),
var4 = c("pencil", "pen", "eraser"),
var5 = c("lamp", "rug", "couch"))
I would like the resulting df:
ID var1 var2 var3 var4 var5 var6
1 a 40 fish 9 pencil lamp 0
2 b 22 55 18 11 -2 1
3 a 12 fish 81 pencil lamp 0
4 d 4 pig 3 pen rug 1
5 e 0 0 0 0 0 0
6 d 2 pig 2 pen rug 2
7 f 1 cow 1 eraser couch 1
I think there is a tidyverse solution using mutate
across
and case_when
but I cannot figure out how to do this. Any help would be appreciated.
CodePudding user response:
An option is also to loop across
the column names from 'df2' in df1
, match
the 'ID' and coalesce
with the original column values
library(dplyr)
df1 %>%
mutate(across(any_of(names(df2)[-1]),
~ coalesce(df2[[cur_column()]][match(ID, df2$ID)], as.character(.x))))
-output
ID var1 var2 var3 var4 var5 var6
1 a 40 fish 9 pencil lamp 0
2 b 22 55 18 11 -2 1
3 a 12 fish 81 pencil lamp 0
4 d 4 pig 3 pen rug 1
5 e 0 0 0 0 0 0
6 d 2 pig 2 pen rug 2
7 f 1 cow 1 eraser couch 1
CodePudding user response:
library(tidyverse)
df1 %>%
mutate(row = row_number(), .before = 1) %>% # add row number
pivot_longer(-c(ID, row)) %>% # reshape long
mutate(value = as.character(value)) %>% # numbers as text like df2
left_join(df2 %>% # join to long version of df2
pivot_longer(-ID), by = c("ID", "name")
) %>%
mutate(new_val = coalesce(value.y, value.x)) %>% # preferentially use df2 val
select(-value.x, -value.y) %>%
pivot_wider(names_from = name, values_from = new_val) # reshape wide again
Result
# A tibble: 7 × 8
row ID var1 var2 var3 var4 var5 var6
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 a 40 fish 9 pencil lamp 0
2 2 b 22 55 18 11 -2 1
3 3 a 12 fish 81 pencil lamp 0
4 4 d 4 pig 3 pen rug 1
5 5 e 0 0 0 0 0 0
6 6 d 2 pig 2 pen rug 2
7 7 f 1 cow 1 eraser couch 1