Home > Net >  R - Replace items in a list based on another vector
R - Replace items in a list based on another vector

Time:03-12

I am doing a fuzzy name matching exercise and am trying to reduce the number of spelling variations of the same name using tidystringdist. I end up with a dataframe of matches containing two vectors. One has the original value and the second has the value it needs to be changed into. So I need to go back to the original vector of names and change them based on the df with the match values. Normal this would be easy, left_join() on the original names and done. But, my original names can have anywhere from 1 to 4 values in it (multiple owners on properties) so the values to be changed are actually a list of lists. Here is a reprex of what I have done so far:

library(dplyr)

data_to_change <- data.frame(house_number = c(1,2,3),
                             animal = rbind(c("dog|cat|monkey"), 
                                            c("goldfish"), 
                                            c("mouse|dog|rabbit|squirrel"))) %>% 
  mutate(animal_split = strsplit(animal, "[|]"))

new_names <- data.frame(cbind(V1 = c("dog", "rabbit"),
                              V2 = c("doggy", "bunny")))

The original data looks like this:

[[1]]
[1] "dog"    "cat"    "monkey"

[[2]]
[1] "goldfish"

[[3]]
[1] "mouse"    "dog"      "rabbit"   "squirrel"

And I would like to change the animal names so the result looks like this:

[[1]]
[1] "doggy"  "cat"    "monkey"

[[2]]
[1] "goldfish"

[[3]]
[1] "mouse"    "doggy"    "bunny"    "squirrel"

I don't believe I can simply use replace, because the target and match df list are of different lengths. And I don't think I can unlist it and change it because I need to preserve the association with the house number and other animals in the house.

CodePudding user response:

You can use a lapply() to wrap around your list, and use stringi::str_replace_all_fixed() to replace the text.

library(stringi)

data_to_change$animal_split <- lapply(data_to_change$animal_split, stri_replace_all_fixed, new_names$V1, new_names$V2, vectorize = F)

data_to_change$animal_split
[[1]]
[1] "doggy"  "cat"    "monkey"

[[2]]
[1] "goldfish"

[[3]]
[1] "mouse"    "doggy"    "bunny"    "squirrel"

CodePudding user response:

As these are fixed matches, we can use deframe to convert the data.frame into a named vector and then use that to match and replace the vector elements in the list by looping over (map) and finally coalesce with the original vector so that the NAs are replaced by original vector

library(dplyr)
library(tibble)
library(purrr)
data_to_change %>%
  mutate(animal_split = map(animal_split,
      ~ coalesce(deframe(new_names)[.x], .x)))

-output

  house_number                    animal                  animal_split
1            1            dog|cat|monkey            doggy, cat, monkey
2            2                  goldfish                      goldfish
3            3 mouse|dog|rabbit|squirrel mouse, doggy, bunny, squirrel
  • Related