String match repeated letter and ignore other letters between the repetitions-CodePudding

I have a list of words. I want to count the words that have a certain letter repeatedly appears. I don't mind how many times the letter repeated appears, as long as it appears at least twice. I don't mind if the repetition is adjacent or not. I want to include both "ppa" and "pepa" for example.

fruit <- c("apple", "banana", "pear", "pineapple", "papaya")

Say this is my list. My target letter is "p". I want to count words that have at least two "p". So I want to count "apple", "pineapple", and "papaya". The number I want to obtain is 3.

I've tried

str_detect(fruit, "p[abcdefghijklmmoqrstuvwxyz]p")

But this does not count "apple" and "pineapple". Is there a way to have all three words included?

CodePudding user response：

A non-regex way to approach the problem is to count number of 'p' in fruits. This can be done using str_count function.

library(stringr)
fruit[str_count(fruit, 'p') > 1]
#[1] "apple"     "pineapple" "papaya"

If you want output as 3, you can sum the output instead of subsetting.

sum(str_count(fruit, 'p') > 1)
#[1] 3

where str_count returns the number of times the pattern is repeated which in our case is 'p'.

str_count(fruit, 'p')
#[1] 2 0 1 3 2

CodePudding user response：

If you really want to use regex to solve this problem, one of the many ways could be:

p[a-zA-Z]*p

The regex essentially looks for at least two 'p' along with other alphabets. The total number of matches is the expected output you are looking for.

Demo