I have the following data frame:
library(dplyr)
old_data = data.frame(id = c(1,2,3), var1 = c(11,12,13))
> old_data
id var1
1 1 11
2 2 12
3 3 13
I want to replace the values in the 2nd row of "old_data" with data in "new_data" (i.e. rows in "old_data" where the id variables matches ):
new_data = data.frame(id = c(4,2,5), var1 = c(11,15,13))
> new_data
id var1
1 4 11
2 2 15
3 5 13
Using the answer found here (Update rows of data frame in R), I tried to do this with the "dplyr" library:
update = old_data %>%
rows_update(new_data, by = "id")
But this gave me the following error:
Error: Attempting to update missing rows.
Run `rlang::last_error()` to see where the error occurred.
This is what I am trying to get:
id var1
1 1 11
2 2 15
3 3 13
Can someone please tell me what I am doing wrong?
Thanks!
CodePudding user response:
A little bit messy but this works (on this sample data at least)
old_data %>%
left_join(new_data,by="id") %>%
mutate(var1 = if_else(!is.na(var1.y),var1.y,var1.x)) %>%
select(id,var1)
# id var1
#1 1 11
#2 2 15
#3 3 13
CodePudding user response:
A base R approach using match
-
inds <- match(old_data$id, new_data$id)
old_data$var1[!is.na(inds)] <- na.omit(new_data$var1[inds])
old_data
# id var1
#1 1 11
#2 2 15
#3 3 13
CodePudding user response:
A data.table
approach (with turning the data table back into a dataframe):
library(data.table)
as.data.frame(setDT(old_data)[new_data, var1 := .(i.var1), on = "id"])
Output
id var1
1 1 11
2 2 15
3 3 13
An alternative tidyverse
option using rows_update
. You can filter new_data
to only have id
s that appear in old_data
. Then, you can update those values, like you had previously tried. Essentially, new_data
must only have id
values that appear in old_data
.
library(tidyverse)
old_data %>%
rows_update(., new_data %>% filter(id %in% old_data$id), by = "id")
Data
old_data <-
structure(list(id = c(1, 2, 3), var1 = c(11, 12, 13)),
class = "data.frame",
row.names = c(NA,-3L))
new_data <-
structure(list(id = c(4, 2, 5), var1 = c(11, 15, 13)),
class = "data.frame",
row.names = c(NA,-3L))
CodePudding user response:
We can use dplyr::rows_update
if we first use a semi_join
on new_data
to filter only those id
s that are included in old_data
.
library(dplyr)
old_data %>%
rows_update(new_data %>%
semi_join(old_data, by = "id"),
by = "id")
#> id var1
#> 1 1 11
#> 2 2 15
#> 3 3 13
Created on 2021-12-29 by the reprex package (v0.3.0)