I have a list of words. I want to count the words that have a certain letter repeatedly appears. I don't mind how many times the letter repeated appears, as long as it appears at least twice. I don't mind if the repetition is adjacent or not. I want to include both "ppa" and "pepa" for example.
fruit <- c("apple", "banana", "pear", "pineapple", "papaya")
Say this is my list. My target letter is "p". I want to count words that have at least two "p". So I want to count "apple", "pineapple", and "papaya". The number I want to obtain is 3.
I've tried
str_detect(fruit, "p[abcdefghijklmmoqrstuvwxyz]p")
But this does not count "apple" and "pineapple". Is there a way to have all three words included?
CodePudding user response:
A non-regex way to approach the problem is to count number of 'p'
in fruits
. This can be done using str_count
function.
library(stringr)
fruit[str_count(fruit, 'p') > 1]
#[1] "apple" "pineapple" "papaya"
If you want output as 3, you can sum
the output instead of subsetting.
sum(str_count(fruit, 'p') > 1)
#[1] 3
where str_count
returns the number of times the pattern is repeated which in our case is 'p'
.
str_count(fruit, 'p')
#[1] 2 0 1 3 2
CodePudding user response:
If you really want to use regex to solve this problem, one of the many ways could be:
p[a-zA-Z]*p
The regex essentially looks for at least two 'p' along with other alphabets. The total number of matches is the expected output you are looking for.