I want to filter rows of a data frame (containing words) to only keep the words that are made of some letters. For instance, let's say I have a data frame such as:
library(tidyverse)
df <- data.frame(words = c("acerbe", "malus", "as", "clade", "after", "sel", "moineau") )
words
1 acerbe
2 malus
3 as
4 clade
5 after
6 sel
7 moineau
I want to keep only the rows (words) that are made of the following letters (and only them):
letters <- c("a", "z", "e", "r", "q", "s", "d", "f", "w", "x", "c")
In other words, I want to exclude words that contain other letters than those listed above.
I have tried using string::str_detect(), but without success so far...
letters <- "a|z|e|r|q|s|d|f|w|x|c"
df <- data.frame(words = c("acerbe", "malus", "as", "clade", "after", "sel", "moineau") )
df %>% filter(str_detect(string = words, pattern = letters, negate = FALSE) )
words
1 acerbe
2 malus
3 as
4 clade
5 after
6 sel
7 moineau
CodePudding user response:
I would use a grepl
approach here:
letters <- c("a", "z", "e", "r", "q", "s", "d", "f", "w", "x", "c")
regex <- paste0("^[", paste(letters, collapse=""), "] $")
df$words[grepl(regex, df$words)]
[1] "as"
Note that the regex pattern being used here with grepl
is:
^[azerqsdfwxc] $
The only word which contains only these letters in your input data frame happens to be as
.
CodePudding user response:
A dplyr approach:
df %>%
rowwise() %>%
filter(sum(str_count(words, letters))==nchar(words))
# A tibble: 1 x 1
# Rowwise:
words
<chr>
1 as