Home > Blockchain >  Replace values using information from another dataset
Replace values using information from another dataset

Time:09-16

I have a huge data frame and I need to replace some values using information from another data frame.

Here you can see an example of my data frames (much shorter):

#first df
first_column <- c(1,1,1,1,2,2,2,3,3)
second_column <- c(1,2,3,4,2,3,4,3,4)
df <- data.frame(first_column, second_column)

#Second df
first_column2 <- c(1,2,3,4)
second_column2 <- c(463,535,637,227)
df2 <- data.frame(first_column2, second_column2)

I would like to replace values of df using the information from df2

so:

1 -> 463

2 -> 535

3 -> 637

4 -> 227

In both columns so I would end up with a df like this one:

first_column3 <- c(463,463,463,463,535,535,535,637,637)
second_column3 <- c(463,535,637,227,535,637,227,637,227)
df3 <-data.frame(first_column3, second_column3)

I cannot do it mannually because I have a huge data frame. I have tried using ifelse and replace and different packages such as dyplr and tydyverse but I cannot make it work.

Any ideas about how I could transform my data automatically?

Thank you so much in advance.

CodePudding user response:

With match:

df[] <- sapply(df, function(x) df2$second_column2[match(x, df2$first_column2)])
  first_column second_column
1          463           463
2          463           535
3          463           637
4          463           227
5          535           535
6          535           637
7          535           227
8          637           637
9          637           227

CodePudding user response:

Using a named vector to match and replace

df[] <- with(df2, setNames(second_column2, first_column2))[as.matrix(df)]

-output

> df
  first_column second_column
1          463           463
2          463           535
3          463           637
4          463           227
5          535           535
6          535           637
7          535           227
8          637           637
9          637           227

Or using dplyr by converting the df2 to a named vector (deframe) and use that to match and replace

library(dplyr)
library(tibble)
df %>% 
   mutate(across(everything(), ~ deframe(df2)[.x]))
first_column second_column
1          463           463
2          463           535
3          463           637
4          463           227
5          535           535
6          535           637
7          535           227
8          637           637
9          637           227
  • Related