Home > Mobile >  Filtering a tibble in R by list of strings and returning all records that end with the strings in th
Filtering a tibble in R by list of strings and returning all records that end with the strings in th

Time:12-26

I have a huge data frame. One of the columns in the data frame is an email address. In addition, I have a vector with domain extensions (for example: c(".ac",".ad",".ae",".af",".ag",".ai") - a total length of 259 extensions.) I want to filter my data frame to contain records whose email ends with one of the strings in the extensions list.

I tried several options, but none of them produced the desired result.

df %>% 
  filter(endsWith(email, extensions)) 
df %>% 
  filter(stringr::str_ends(email, extensions)) 

CodePudding user response:

You can use the regular expression for pattern matching:

ext <- c("ac","ad","ae","af","ag","ai")

df %>% 
  filter(grepl(sprintf("\\.(%s)$", paste(ext, collapse = '|')), email))

where the sprintf part creates a legitimate regex syntax like

"\\.(ac|ad|ae|af|ag|ai)$"

CodePudding user response:

Here's an option using dplyr:

library(dplyr)
email <- data.frame(
  email = c("[email protected]", "[email protected]", "[email protected]")
)

extensions <- c(".ac",".ad",".ae",".af",".ag",".ai")

email %>% 
  mutate(ext = paste0(".", sub('.*\\.', '', email))) %>% 
  filter(ext %in% extensions)
  • Related