Home > OS >  Apply value replacement using data dictionary
Apply value replacement using data dictionary

Time:10-15

df1=c("blue,green,green", "blue", "green")
dictionary=data.frame(label=c("blue","green"),value=c(1,2))
want=c("1,2,2","1","2")

I am looking to replace some data using a data dictionary. The data (df1) may have multiple entries in one cell, separated by a comma. I have looked at str_replace_all and I was able to do so bystr_replace_all(colors,c("blue"="1","green"="2")) but I was unable to create c("blue"="1","green"="2") using the dictionary data frame. I have over a hundred items to code so hard coding is not an option. Any guidance on how to make that work or another way around would be greatly appreciated!

CodePudding user response:

Create a named vector from dictionary and use str_replace_all

library(dplyr)
library(stringr)
library(tibble)
dictionary %>% 
   mutate(value = as.character(value)) %>%
   deframe %>%
   str_replace_all(df1, .)
#[1] "1,2,2" "1"     "2"   

CodePudding user response:

Here is a base R option:

rename <- with(dictionary, setNames(value, label))

lapply(strsplit(df1, ","), \(x) unname(rename[x])) |>
  lapply(\(x) paste(x, collapse = ",")) |> 
  unlist()

[1] "1,2,2" "1"     "2"  

CodePudding user response:

Base R Option 1 using nested vapply() 's:

# dict => named integer vector
dict <- setNames(dictionary$value, dictionary$label)

# Loop through and replace values: df1_replaced => character vector
df1_replaced <- vapply(
   df1, 
   function(x){
      y <- strsplit(
         x, 
         ","
      )
      vapply(
         y, 
         function(z){
            toString(
               dict[
                  match(
                     z, 
                     names(dict)
                     )
                  ]
            )
         },
         character(1)
      )
   },
   character(1),
   USE.NAMES = FALSE
)
  • Related