characters_to_filter <- ",.#"
myVector <- c("Mac.", "ved", "der,", "ght#", "hoy")
results: myNewVector <- c("ved", "hoy")
I tried str_detect but it only works for one character. Is there a solution using dplyr?
CodePudding user response:
With grep
, we construct the pattern with either paste
or sprintf
by wrapping the 'characters_to_filter' inside square bracket ([]
- so that it will match either one of the characters or may have to paste
with |
, but that also needs to escape (\\
) some of them as .
can match any character as it is a metacharacter), specify the invert = TRUE
to return the subset of vector which doesn't match the pattern
grep(sprintf('[%s]', characters_to_filter), myVector, invert = TRUE, value = TRUE)
[1] "ved" "hoy"
Or using str_subset
library(stringr)
str_subset(myVector, sprintf('[%s]', characters_to_filter), negate = TRUE)
[1] "ved" "hoy"
Or if we want to use str_detect
, which returns a logical vector, which should then be used as index for subsetting
> str_detect(myVector, sprintf('[%s]', characters_to_filter), negate = TRUE)
[1] FALSE TRUE FALSE FALSE TRUE
> myVector[str_detect(myVector, sprintf('[%s]', characters_to_filter), negate = TRUE)]
[1] "ved" "hoy"
Or with str_c
and str_detect
> myVector[str_detect(myVector, str_c('[', characters_to_filter, ']'), negate = TRUE)]
[1] "ved" "hoy"
CodePudding user response:
This one will remove all words with any special character:
- .* match 1 or multiple character
- [[:punct:]] match special character
then use str_subset(..., ". ") to remove empty string!
library(stringr)
str_subset(str_remove_all(myVector, ".*[[:punct:]]"), ". ")
[1] "ved" "hoy"
CodePudding user response:
If you want to use str_detect
you can filter
using the negation operator !
in dplyr
:
library(dplyr)
data.frame(myVector) %>%
filter(!str_detect(myVector, "[.,#]"))
myVector
1 ved
2 hoy
[.,#]
is a character class containing the characters to filter.