Home > front end >  How do I find words with three or more vowels (of the same kind) with regex using back referencing?
How do I find words with three or more vowels (of the same kind) with regex using back referencing?

Time:10-06

How can I find words with three or more vowels of the same kind with a regular expression using back referencing?

I'm searching in text with a 3-column tab format "Word PoS Lemma".

This is what I have so far:

ggrep -P -i --colour=always '^\w*([aeioueöäüèéà])\w*?\1\w*?\1\w*?\t' filename

However, this gives me words with three vowels but not of the same kind. I'm confused, because I thought the back referencing would refer to the same vowel it found in the brackets? I solved this problem by changing the .*? to \w*?.

But I still need to know how I can achieve the or more part?

Thanks for the help!

CodePudding user response:

Your regex looks too complicated, not sure what you're trying to accomplish with the .*? but the usage looks suspect. I'd use something like:

([aeioueöäüèéà])\1\1

i.e. match a vowel as a capture group, then say you need two more.

Didn't realise you wanted to allow other letters between vowels, just allow zero or more "word" letters between backreferences:

([aeioueöäüèéà])(\w*\1){2}

CodePudding user response:

Using grep

$ grep -E '(([aeioueöäüèéà])[^\2]*){3,}' input_file
  • Related