For the following code,
import re
p = re.compile(r'\b(?:\w*[aeiou]){3}\w*',re.I)
print(p.findall('The group contains some of the most dangerous criminals in the country.'))
the regex is matching any word with at least 3 vowels in it. the expected output format is
[('contains', 'ins'), ('dangerous', 'us'), ('criminals', 'als')]
(the second component will start with the last vowel to the last character of the word)
but I get
['contains', 'dangerous', 'criminals']
How to make it output on my expected format?
CodePudding user response:
There are two options:
Match all the words, then transform the result with another regex (e.g. with a list comprehension):
last_vowel = re.compile('\w*([aeiou]\w*)$', re.I); words = p.findall(…) print([(w, *last_vowel.findall(w)) for w in words])
Change your regex to capture the word and the last vowel in separate capturing groups:
# vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv- first group p = re.compile(r'\b((?:\w*[aeiou]){2}\w*([aeiou]\w*))',re.I) # ^^^^^^^^^^^^-- second group
CodePudding user response:
You can use a single regex with 2 capture groups, where group 2 is inside group 1 starting with the last vowel to the last character of the word.
Then re.findall will return a list of tuples of the 2 capture group values.
\b((?:\w*[aeiou]){2}\w*([aeiou]\w*))
Explanation
\b
A word boundary(
Capture group 1(?:\w*[aeiou]){2}
Repeat 2 times matching optional word chars and a vowel\w*
Match optional word chars([aeiou]\w*)
Capture group 2, match the 3rd vowel and optional word chars
)
Close group 1
See a regex demo
Example
import re
p = re.compile(r'\b((?:\w*[aeiou]){2}\w*([aeiou]\w*))', re.I)
print(p.findall('The group contains some of the most dangerous criminals in the country.'))
Output
[('contains', 'ins'), ('dangerous', 'us'), ('criminals', 'als')]