Home > Enterprise >  Filter strings that only contains some letters in R
Filter strings that only contains some letters in R

Time:10-25

I want to filter rows of a data frame (containing words) to only keep the words that are made of some letters. For instance, let's say I have a data frame such as:

library(tidyverse)

df <- data.frame(words = c("acerbe", "malus", "as", "clade", "after", "sel", "moineau") )

   words
1 acerbe
2  malus
3     as
4  clade
5  after
6    sel
7 moineau

I want to keep only the rows (words) that are made of the following letters (and only them):

letters <-  c("a", "z", "e", "r", "q", "s", "d", "f", "w", "x", "c")

In other words, I want to exclude words that contain other letters than those listed above.

I have tried using string::str_detect(), but without success so far...

letters <- "a|z|e|r|q|s|d|f|w|x|c"

df <- data.frame(words = c("acerbe", "malus", "as", "clade", "after", "sel", "moineau") )
df %>% filter(str_detect(string = words, pattern = letters, negate = FALSE) )

    words
1  acerbe
2   malus
3      as
4   clade
5   after
6     sel
7 moineau

CodePudding user response:

I would use a grepl approach here:

letters <-  c("a", "z", "e", "r", "q", "s", "d", "f", "w", "x", "c")
regex <- paste0("^[", paste(letters, collapse=""), "] $")
df$words[grepl(regex, df$words)]

[1] "as"

Note that the regex pattern being used here with grepl is:

^[azerqsdfwxc] $

The only word which contains only these letters in your input data frame happens to be as.

CodePudding user response:

A dplyr approach:

df %>% 
rowwise() %>% 
filter(sum(str_count(words, letters))==nchar(words)) 

# A tibble: 1 x 1
# Rowwise: 
  words
  <chr>
1 as
  • Related