I would like to extract from a column the names matching the names contained in a large vector of characters. In some cases, the extracted string is not complete because of whitespaces.
Here below a replicable example:
library(stringr)
library(dplyr)
library(tidyr)
library(stringi)
data <- data.frame (address = c("to New York street", "to New cafe", "to Paris avenue", "to London hostel"))
search_string<-c("London","Paris", "New", "New York")%>% paste(collapse = " |to ")
data %>% dplyr::mutate(temp_com = str_extract_all(paste(address), search_string))
This is the results :
address temp_com
1 to New York street to New
2 to New cafe to New
3 to Paris avenue to Paris
4 to London hostel London
And this is what I would like:
address temp_com
1 to New York street to New York
2 to New cafe to New
3 to Paris avenue to Paris
4 to London hostel London
Thank you very much for your help
CodePudding user response:
Change the order of your search strings to longest-to-shortest. (Also, I'm inferring you intend to have "to "
before your first search string, it's being omitted in your current example.)
search_string <- c("London","Paris", "New", "New York")
search_string <- paste(paste("to", search_string[order(-nchar(search_string))]), collapse = "|")
search_string
# [1] "to New York|to London|to Paris|to New"
data %>%
dplyr::mutate(temp_com = str_extract_all(paste(address), search_string))
# address temp_com
# 1 to New York street to New York
# 2 to New cafe to New
# 3 to Paris avenue to Paris
# 4 to London hostel to London