Home > OS >  filter out element in list that contains any characters in a list
filter out element in list that contains any characters in a list

Time:06-20

characters_to_filter <- ",.#"

myVector <- c("Mac.", "ved", "der,", "ght#", "hoy")

results: myNewVector <- c("ved", "hoy")

I tried str_detect but it only works for one character. Is there a solution using dplyr?

CodePudding user response:

With grep, we construct the pattern with either paste or sprintf by wrapping the 'characters_to_filter' inside square bracket ([] - so that it will match either one of the characters or may have to paste with |, but that also needs to escape (\\) some of them as . can match any character as it is a metacharacter), specify the invert = TRUE to return the subset of vector which doesn't match the pattern

grep(sprintf('[%s]', characters_to_filter), myVector, invert = TRUE, value = TRUE)
[1] "ved" "hoy"

Or using str_subset

library(stringr)
str_subset(myVector, sprintf('[%s]', characters_to_filter), negate = TRUE)
[1] "ved" "hoy"

Or if we want to use str_detect, which returns a logical vector, which should then be used as index for subsetting

> str_detect(myVector, sprintf('[%s]', characters_to_filter), negate = TRUE)
[1] FALSE  TRUE FALSE FALSE  TRUE
> myVector[str_detect(myVector, sprintf('[%s]', characters_to_filter), negate = TRUE)]
[1] "ved" "hoy"

Or with str_c and str_detect

> myVector[str_detect(myVector, str_c('[', characters_to_filter, ']'), negate = TRUE)]
[1] "ved" "hoy"

CodePudding user response:

This one will remove all words with any special character:

  • .* match 1 or multiple character
  • [[:punct:]] match special character

then use str_subset(..., ". ") to remove empty string!

library(stringr)
str_subset(str_remove_all(myVector, ".*[[:punct:]]"), ". ")
[1] "ved" "hoy"

CodePudding user response:

If you want to use str_detect you can filter using the negation operator ! in dplyr:

library(dplyr)
data.frame(myVector) %>%
     filter(!str_detect(myVector, "[.,#]"))
  myVector
1      ved
2      hoy

[.,#] is a character class containing the characters to filter.

  • Related