Home > Net >  R replace string in df with partial match in a list
R replace string in df with partial match in a list

Time:04-14

I have a dataframe (df) in R and I want to create a new column (city1_n) that contains a line stored in the list key whenever there is a partial match between city1 and key. Bellow I have created a little example that should help to visualize my problem.

> dput(df)
structure(list(Country = c("USA", "France", "Italy", "Spain", 
"Mexico"), City1 = c("Los angeles", "Paris", "Rome", "Madrid", 
"Cancun"), City2 = c("New York", "Lyon", "Pisa", "Barcelona", 
"San Cristobal de las Casas")), class = "data.frame", row.names = c(NA, 
-5L))

> dput(key)
list("Los angeles California", "Paris Île-de-France", "Rome Lazio", 
    "Madrid Comunidad de Madrid ", "Cancun Quintana Roo")

enter image description here

Result enter image description here

Any help in R or unix will be appreciated. Thanks

CodePudding user response:

Use fuzzyjoin::fuzzyjoin:

fuzzyjoin::fuzzy_left_join(df, data.frame(key), by = c("City1" = "key"), match_fun = \(x,y) str_detect(y, x))

  Country       City1                      City2                         key
1     USA Los angeles                   New York      Los angeles California
2  France       Paris                       Lyon         Paris Île-de-France
3   Italy        Rome                       Pisa                  Rome Lazio
4   Spain      Madrid                  Barcelona Madrid Comunidad de Madrid 
5  Mexico      Cancun San Cristobal de las Casas         Cancun Quintana Roo

data

df <- structure(list(Country = c("USA", "France", "Italy", "Spain", 
                           "Mexico"), City1 = c("Los angeles", "Paris", "Rome", "Madrid", 
                                                "Cancun"), City2 = c("New York", "Lyon", "Pisa", "Barcelona", 
                                                                     "San Cristobal de las Casas")), class = "data.frame", row.names = c(NA, 
                                                                                                                                         -5L))

key <- c("Los angeles California", "Paris Île-de-France", "Rome Lazio", 
     "Madrid Comunidad de Madrid ", "Cancun Quintana Roo")
  • Related