Home > Enterprise >  Updating Values in New Data with Old Data
Updating Values in New Data with Old Data

Time:12-29

I have the following data frame:

library(dplyr)

old_data = data.frame(id = c(1,2,3), var1 = c(11,12,13))

> old_data

  id var1
1  1   11
2  2   12
3  3   13

I want to replace the values in the 2nd row of "old_data" with data in "new_data" (i.e. rows in "old_data" where the id variables matches ):

new_data = data.frame(id = c(4,2,5), var1 = c(11,15,13))

> new_data

  id var1
1  4   11
2  2   15
3  5   13

Using the answer found here (Update rows of data frame in R), I tried to do this with the "dplyr" library:

update =  old_data %>%
     rows_update(new_data, by = "id")

But this gave me the following error:

Error: Attempting to update missing rows.
Run `rlang::last_error()` to see where the error occurred.

This is what I am trying to get:

   id var1
    1  1   11
    2  2   15
    3  3   13

Can someone please tell me what I am doing wrong?

Thanks!

CodePudding user response:

A little bit messy but this works (on this sample data at least)

old_data %>% 
  left_join(new_data,by="id") %>% 
  mutate(var1 = if_else(!is.na(var1.y),var1.y,var1.x)) %>% 
  select(id,var1)

#  id var1
#1  1   11
#2  2   15
#3  3   13

CodePudding user response:

A base R approach using match -

inds <- match(old_data$id, new_data$id)
old_data$var1[!is.na(inds)] <- na.omit(new_data$var1[inds])
old_data

#  id var1
#1  1   11
#2  2   15
#3  3   13

CodePudding user response:

A data.table approach (with turning the data table back into a dataframe):

library(data.table)

as.data.frame(setDT(old_data)[new_data, var1 := .(i.var1), on = "id"])

Output

  id var1
1  1   11
2  2   15
3  3   13

An alternative tidyverse option using rows_update. You can filter new_data to only have ids that appear in old_data. Then, you can update those values, like you had previously tried. Essentially, new_data must only have id values that appear in old_data.

library(tidyverse)

old_data %>% 
  rows_update(., new_data %>% filter(id %in% old_data$id), by = "id")

Data

old_data <-
  structure(list(id = c(1, 2, 3), var1 = c(11, 12, 13)),
            class = "data.frame",
            row.names = c(NA,-3L))
new_data <-
  structure(list(id = c(4, 2, 5), var1 = c(11, 15, 13)),
            class = "data.frame",
            row.names = c(NA,-3L))

CodePudding user response:

We can use dplyr::rows_update if we first use a semi_join on new_data to filter only those ids that are included in old_data.

library(dplyr)

old_data %>% 
  rows_update(new_data %>%
                semi_join(old_data, by = "id"),
              by = "id")

#>   id var1
#> 1  1   11
#> 2  2   15
#> 3  3   13

Created on 2021-12-29 by the reprex package (v0.3.0)

  • Related