Home > Software engineering >  In R, how to replace values in a column with values of another column of another data set based on a
In R, how to replace values in a column with values of another column of another data set based on a

Time:11-24

I have to data sets, samples of which I've given below. I need to replace project names in target_df$project_name, in case they are present in registry_df$to_change with corresponding values in registry_df$replacement. However, the code I tried, obviously, did not deliver any result. How should it be corrected or what other way there is to achieve the desired goal?

Data sets:

target_df <- tibble::tribble(
  ~project_name,     ~sum,   
  "Mark",            "4307",     
  "Boat",            "9567",       
  "Delorean",        "5344",      
  "Parix",           "1043",
)

registry_df <- tibble::tribble(
  ~to_change,     ~replacement,   
  "Mark",            "Duck",     
  "Boat",            "Tank",       
  "Toloune",         "Bordeaux",      
  "Hunge",           "Juron",
)

Desired output of target_df :

project_name        sum   
  "Duck"            "4307"     
  "Tank"            "9567"       
  "Delorean"        "5344"      
  "Parix"           "1043"

Code:

library(data.table)

target_df <- transform(target_df, 
                       project_name = ifelse(target_df$project_name %in% registry_df$to_change),
                       registry_df$replacement,
                       project_name
)

CodePudding user response:

A dplyr solution. There's probably an elegant way with less steps.

library(dplyr)

target_df <- target_df %>% 
  left_join(registry_df,  
            by = c("project_name" = "to_change")) %>% 
  mutate(replacement = ifelse(is.na(replacement), project_name, replacement)) %>% 
  select(project_name = replacement, sum)

Result:

# A tibble: 4 × 2
  project_name sum  
  <chr>        <chr>
1 Duck         4307 
2 Tank         9567 
3 Delorean     5344 
4 Parix        1043

CodePudding user response:

A base R solution: You can match the columns using the match function. Since not all levels of target_df$project_name are in registry_df$to_change your matching variable will have NAs. Therefor, I included the ifelse function which in case of NAs keeps original values.

matching <- registry_df$replacement[match(target_df$project_name, registry_df$to_change)]
target_df$project_name <- ifelse(is.na(matching),
                                 target_df$project_name,
                                 matching)

target_df gives expected output:

  project_name sum  
  <chr>        <chr>
1 Duck         4307 
2 Tank         9567 
3 Delorean     5344 
4 Parix        1043 
  • Related