Home > Net >  Match and replace values from one dataframe with another
Match and replace values from one dataframe with another

Time:07-28

Let's say I have two dataframes like below, (real dataset has many more rows and cols)

df = data.frame("Worker" = c("JBB","JDD","MB","JBB"),
                 "Age" = c(4,5,6,4))

df2 = data.frame("Initials" = c("JBB","JDD","MB","JOD"),
                 "Worker" = c("Joe Bloggs/JBB", "Jane Doe/JDD", 
                                "Mr. Big/MB", "John Doe/JOD"))

I would like to replace the Worker col in df with the Worker col from df2 In the future more workers will be added to both dataframes so it would be nice if there was a quick and easy way to do this rather than manually doing something like this for each set of initials

df$Worker<-paste(gsub("JBB", "Joe Bloggs/JBB", df$Worker, perl=TRUE))

Perhaps a loop or simply some kind of tidyverse::replace solution

I have tried various joins but they don't work for me.

Have also tried

df %>%
mutate(new_Worker = case_when(df$Worker == df2$Initials ~ df2$Worker)

This gives errors too.

CodePudding user response:

A possible solution:

library(dplyr)

inner_join(df, df2, by = c("Worker" = "Initials"))

#>   Worker Age       Worker.y
#> 1    JBB   4 Joe Bloggs/JBB
#> 2    JDD   5   Jane Doe/JDD
#> 3     MB   6     Mr. Big/MB
#> 4    JBB   4 Joe Bloggs/JBB

CodePudding user response:

A simple solution is to rename the Worker column in df as you do the join

left_join(rename(df, Initials = Worker), 
          df2)

this results in the data.frame with columns 'Initials', 'Worker' and 'Age'. It also assumes that df is the data and that df2 is the lookup list.

I don't think you would use case_when for this example. Presumably there is a large number of Initials.

The other option is to filter and pull the values from df2.

df |>
  mutate(Worker = map_chr(Worker, 
                          ~ filter(df2, Initials == .x) |>  
                                   pull(Worker)
                          )
         )

The map_chr above is needed otherwise the nested filter does not work

  • Related