I have a dataframe (df) in R and I want to create a new column (city1_n) that contains a line stored in the list key whenever there is a partial match between city1 and key. Bellow I have created a little example that should help to visualize my problem.
> dput(df)
structure(list(Country = c("USA", "France", "Italy", "Spain",
"Mexico"), City1 = c("Los angeles", "Paris", "Rome", "Madrid",
"Cancun"), City2 = c("New York", "Lyon", "Pisa", "Barcelona",
"San Cristobal de las Casas")), class = "data.frame", row.names = c(NA,
-5L))
> dput(key)
list("Los angeles California", "Paris Île-de-France", "Rome Lazio",
"Madrid Comunidad de Madrid ", "Cancun Quintana Roo")
Any help in R or unix will be appreciated. Thanks
CodePudding user response:
Use fuzzyjoin::fuzzyjoin
:
fuzzyjoin::fuzzy_left_join(df, data.frame(key), by = c("City1" = "key"), match_fun = \(x,y) str_detect(y, x))
Country City1 City2 key
1 USA Los angeles New York Los angeles California
2 France Paris Lyon Paris Île-de-France
3 Italy Rome Pisa Rome Lazio
4 Spain Madrid Barcelona Madrid Comunidad de Madrid
5 Mexico Cancun San Cristobal de las Casas Cancun Quintana Roo
data
df <- structure(list(Country = c("USA", "France", "Italy", "Spain",
"Mexico"), City1 = c("Los angeles", "Paris", "Rome", "Madrid",
"Cancun"), City2 = c("New York", "Lyon", "Pisa", "Barcelona",
"San Cristobal de las Casas")), class = "data.frame", row.names = c(NA,
-5L))
key <- c("Los angeles California", "Paris Île-de-France", "Rome Lazio",
"Madrid Comunidad de Madrid ", "Cancun Quintana Roo")