I have a huge data frame. One of the columns in the data frame is an email address.
In addition, I have a vector with domain extensions (for example: c(".ac",".ad",".ae",".af",".ag",".ai")
- a total length of 259 extensions.)
I want to filter my data frame to contain records whose email ends with one of the strings in the extensions list.
I tried several options, but none of them produced the desired result.
df %>%
filter(endsWith(email, extensions))
df %>%
filter(stringr::str_ends(email, extensions))
CodePudding user response:
You can use the regular expression for pattern matching:
ext <- c("ac","ad","ae","af","ag","ai")
df %>%
filter(grepl(sprintf("\\.(%s)$", paste(ext, collapse = '|')), email))
where the sprintf
part creates a legitimate regex
syntax like
"\\.(ac|ad|ae|af|ag|ai)$"
CodePudding user response:
Here's an option using dplyr
:
library(dplyr)
email <- data.frame(
email = c("[email protected]", "[email protected]", "[email protected]")
)
extensions <- c(".ac",".ad",".ae",".af",".ag",".ai")
email %>%
mutate(ext = paste0(".", sub('.*\\.', '', email))) %>%
filter(ext %in% extensions)