Home > front end >  Output regex in desired format
Output regex in desired format

Time:09-26

For the following code,

import re 
p = re.compile(r'\b(?:\w*[aeiou]){3}\w*',re.I)

print(p.findall('The group contains some of the most dangerous criminals in the country.'))

the regex is matching any word with at least 3 vowels in it. the expected output format is

[('contains', 'ins'), ('dangerous', 'us'), ('criminals', 'als')] 

(the second component will start with the last vowel to the last character of the word)

but I get

['contains', 'dangerous', 'criminals']

How to make it output on my expected format?

CodePudding user response:

There are two options:

  1. Match all the words, then transform the result with another regex (e.g. with a list comprehension):

    last_vowel = re.compile('\w*([aeiou]\w*)$', re.I);
    words = p.findall(…)
    print([(w, *last_vowel.findall(w)) for w in words])
    
  2. Change your regex to capture the word and the last vowel in separate capturing groups:

    #                  vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv- first group
    p = re.compile(r'\b((?:\w*[aeiou]){2}\w*([aeiou]\w*))',re.I)
    #                                       ^^^^^^^^^^^^-- second group
    

CodePudding user response:

You can use a single regex with 2 capture groups, where group 2 is inside group 1 starting with the last vowel to the last character of the word.

Then re.findall will return a list of tuples of the 2 capture group values.

\b((?:\w*[aeiou]){2}\w*([aeiou]\w*))

Explanation

  • \b A word boundary
  • ( Capture group 1
    • (?:\w*[aeiou]){2} Repeat 2 times matching optional word chars and a vowel
    • \w* Match optional word chars
    • ([aeiou]\w*) Capture group 2, match the 3rd vowel and optional word chars
  • ) Close group 1

See a regex demo

Example

import re
p = re.compile(r'\b((?:\w*[aeiou]){2}\w*([aeiou]\w*))', re.I)

print(p.findall('The group contains some of the most dangerous criminals in the country.'))

Output

[('contains', 'ins'), ('dangerous', 'us'), ('criminals', 'als')]
  • Related