I've been trying to solve this problems for few hours but with no luck. The task is to write a regular expression that matches at least four words starting with the same letter. But! These words do not have to be one after another.
This regex should be able to match a line like this:
cat color coral chat
but also one like this:
cat take boom candle creepy drum cheek
Thank you!
So far I have got this regex but it only matches words when they are in order.
(\w)\w \s \1\w \s \1\w \s \1
CodePudding user response:
If you have only words in the line that can be matched with \w
:
\b(\w)\w*(?:(?:\s \w )*?\s \1\w*){3}
Explanation
\b
A word boundary to prevent a partial word match(\w)\w*
Capture a single word character in group 1 followed by matching optional word characters(?:
Non capture group to repeat as a whole part(?:\s \w )*?
Match 1 whitespace chars and 1 word chars in between in case the word does not start with the character captured in the back reference\s \1\w*
Match 1 whitespace chars, a backreference to the same captured character and optional word characters
){3}
Close the non capture group and repeat 3 times
See a regex demo
Note that \s
can also match a newline.
If the words that should with the same character should be at least 2 characters long (as (\w)\w
matches 2 or more characters)
\b(\w)\w (?:(?:\s \w )*?\s \1\w ){3}
See another regex demo.
CodePudding user response:
Another idea to match lines with at least 4 words starting with the same letter:
\b(\w)(?:.*?\b\1){3}
This is not very accurate, it just checks if there are three \b
word boundaries, each followed by \1
in the first group \b(\w)
captured character to the right with .*?
any characters in between.