mydata <- data.frame(id = 1:5,
item1 = c("", "", "1222", "1222", ""),
item2 = c("13", "", "", "", "382"))
> mydata
id item1 item2
1 1 13
2 2
3 3 1222
4 4 987
5 5 382
I have a dataset where it contains different codes. I want to map these codes to strings based on a dictionary
dictionary <- data.frame(code = c(1, 13, 382, 987, 1222),
entry = c("ballet", "soccer", "basketball", "painting", "pottery"))
> dictionary
code entry
1 1 ballet
2 13 soccer
3 382 basketball
4 987 painting
5 1222 pottery
The desired output is a data.frame with strings:
id item1 item2
1 1 soccer
2 2
3 3 pottery
4 4 pottery
5 5 basketball
CodePudding user response:
mydata <- data.frame(id = 1:5,
item1 = c("", "", "1222", "1222", ""),
item2 = c("13", "", "", "", "382"))
dictionary <- data.frame(code = c(1, 13, 382, 987, 1222),
entry = c("ballet", "soccer", "basketball", "painting", "pottery"))
mydata$item1 <- ifelse(mydata$item1 %in% dictionary$code,
dictionary$entry[match( mydata$item1,dictionary$code)], "")
mydata$item2 <- ifelse(mydata$item2 %in% dictionary$code,
dictionary$entry[match( mydata$item2,dictionary$code)], "")
mydata
#> id item1 item2
#> 1 1 soccer
#> 2 2
#> 3 3 pottery
#> 4 4 pottery
#> 5 5 basketball
Created on 2022-12-15 with reprex v2.0.2
NOTE:
Of course, a canonical way to do this is to use factors (and NA's instead of empty strings):
factor(mydata$item1, dictionary$code, dictionary$entry)
# <NA> <NA> pottery pottery <NA>
CodePudding user response:
May be a bit easier using a named vector as your dictionary. Using dplyr:
library(dplyr)
dictionary_vec <- setNames(dictionary$entry, dictionary$code)
mydata %>%
mutate(across(!id, ~ replace_na(dictionary_vec[.x], "")))
Or in base R:
dictionary_vec <- setNames(dictionary$entry, dictionary$code)
for (cn in colnames(mydata)[-1]) {
mydata[[cn]] <- dictionary_vec[mydata[[cn]]]
}
mydata[is.na(mydata)] <- ""
Result from either approach:
id item1 item2
1 1 soccer
2 2
3 3 pottery
4 4 pottery
5 5 basketball
CodePudding user response:
This works pretty well.
mydata <- data.frame(id = 1:5,
item1 = c("", "", "1222", "1222", ""),
item2 = c("13", "", "", "", "382"))
dictionary <- data.frame(code = c(1, 13, 382, 987, 1222),
entry = c("ballet", "soccer", "basketball", "painting", "pottery"))
mydata$item1[mydata$item1 %in% dictionary$code]<-dictionary$entry[dictionary$code%in%mydata$item1]
mydata$item2[mydata$item2 %in% dictionary$code]<-dictionary$entry[dictionary$code%in%mydata$item2]
mydata
id item1 item2
1 1 soccer
2 2
3 3 pottery
4 4 pottery
5 5 basketball
CodePudding user response:
Elaborating on the "canonical way" using factor
s.
mydata[-1] <- lapply(mydata[-1], factor, levels=dictionary$code, labels=dictionary$entry) |>
lapply(droplevels)
mydata
# id item1 item2
# 1 1 <NA> soccer
# 2 2 <NA> <NA>
# 3 3 pottery <NA>
# 4 4 pottery <NA>
# 5 5 <NA> basketball
The |> lapply(droplevels)
cares for having just the levels that appear in each column. If you want character just replace it with |> lapply(as.character)