I am doing a fuzzy name matching exercise and am trying to reduce the number of spelling variations of the same name using tidystringdist. I end up with a dataframe of matches containing two vectors. One has the original value and the second has the value it needs to be changed into. So I need to go back to the original vector of names and change them based on the df with the match values. Normal this would be easy, left_join() on the original names and done. But, my original names can have anywhere from 1 to 4 values in it (multiple owners on properties) so the values to be changed are actually a list of lists. Here is a reprex of what I have done so far:
library(dplyr)
data_to_change <- data.frame(house_number = c(1,2,3),
animal = rbind(c("dog|cat|monkey"),
c("goldfish"),
c("mouse|dog|rabbit|squirrel"))) %>%
mutate(animal_split = strsplit(animal, "[|]"))
new_names <- data.frame(cbind(V1 = c("dog", "rabbit"),
V2 = c("doggy", "bunny")))
The original data looks like this:
[[1]]
[1] "dog" "cat" "monkey"
[[2]]
[1] "goldfish"
[[3]]
[1] "mouse" "dog" "rabbit" "squirrel"
And I would like to change the animal names so the result looks like this:
[[1]]
[1] "doggy" "cat" "monkey"
[[2]]
[1] "goldfish"
[[3]]
[1] "mouse" "doggy" "bunny" "squirrel"
I don't believe I can simply use replace, because the target and match df list are of different lengths. And I don't think I can unlist it and change it because I need to preserve the association with the house number and other animals in the house.
CodePudding user response:
You can use a lapply()
to wrap around your list, and use stringi::str_replace_all_fixed()
to replace the text.
library(stringi)
data_to_change$animal_split <- lapply(data_to_change$animal_split, stri_replace_all_fixed, new_names$V1, new_names$V2, vectorize = F)
data_to_change$animal_split
[[1]]
[1] "doggy" "cat" "monkey"
[[2]]
[1] "goldfish"
[[3]]
[1] "mouse" "doggy" "bunny" "squirrel"
CodePudding user response:
As these are fixed matches, we can use deframe
to convert the data.frame into a named vector and then use that to match and replace the vector elements in the list
by looping over (map
) and finally coalesce
with the original vector so that the NAs are replaced by original vector
library(dplyr)
library(tibble)
library(purrr)
data_to_change %>%
mutate(animal_split = map(animal_split,
~ coalesce(deframe(new_names)[.x], .x)))
-output
house_number animal animal_split
1 1 dog|cat|monkey doggy, cat, monkey
2 2 goldfish goldfish
3 3 mouse|dog|rabbit|squirrel mouse, doggy, bunny, squirrel