df1=c("blue,green,green", "blue", "green")
dictionary=data.frame(label=c("blue","green"),value=c(1,2))
want=c("1,2,2","1","2")
I am looking to replace some data using a data dictionary. The data (df1) may have multiple entries in one cell, separated by a comma. I have looked at str_replace_all and I was able to do so bystr_replace_all(colors,c("blue"="1","green"="2"))
but I was unable to create c("blue"="1","green"="2")
using the dictionary data frame. I have over a hundred items to code so hard coding is not an option.
Any guidance on how to make that work or another way around would be greatly appreciated!
CodePudding user response:
Create a named vector from dictionary and use str_replace_all
library(dplyr)
library(stringr)
library(tibble)
dictionary %>%
mutate(value = as.character(value)) %>%
deframe %>%
str_replace_all(df1, .)
#[1] "1,2,2" "1" "2"
CodePudding user response:
Here is a base R option:
rename <- with(dictionary, setNames(value, label))
lapply(strsplit(df1, ","), \(x) unname(rename[x])) |>
lapply(\(x) paste(x, collapse = ",")) |>
unlist()
[1] "1,2,2" "1" "2"
CodePudding user response:
Base R Option 1 using nested vapply()
's:
# dict => named integer vector
dict <- setNames(dictionary$value, dictionary$label)
# Loop through and replace values: df1_replaced => character vector
df1_replaced <- vapply(
df1,
function(x){
y <- strsplit(
x,
","
)
vapply(
y,
function(z){
toString(
dict[
match(
z,
names(dict)
)
]
)
},
character(1)
)
},
character(1),
USE.NAMES = FALSE
)