Home > Mobile >  Regex find greedy and lazy matches and all in-between
Regex find greedy and lazy matches and all in-between

Time:03-02

I have a sequence like such '01 02 09 02 09 02 03 05 09 08 09 ', and I want to find a sequence that starts with 01 and ends with 09, and in-between there can be one to nine double-digit, such as 02, 03, 04 etc. This is what I have tried so far.

I'm using w{2}\s (w{2} for matching the two digits, and \s for the whitespace). This can occur one to nine times, which leads to (\w{2}\s){1,9}. The whole regex becomes (01\s(\w{2}\s){1,9}09\s). This returns the following result:

<regex.Match object; span=(0, 33), match='01 02 09 02 09 02 03 05 09 08 09 '>

If I use the lazy quantifier ?, it returns the following result:

<regex.Match object; span=(0, 9), match='01 02 09 '>

How can I obtain the results in-between too. The desired result would include all the following:

<regex.Match object; span=(0, 9), match='01 02 09 '>
<regex.Match object; span=(0, 15), match='01 02 09 02 09 '>
<regex.Match object; span=(0, 27), match='01 02 09 02 09 02 03 05 09 '>
<regex.Match object; span=(0, 33), match='01 02 09 02 09 02 03 05 09 08 09 '>

CodePudding user response:

You can extract these strings using

import re
s = "01 02 09 02 09 02 03 05 09 08 09 "
m = re.search(r'01(?:\s\w{2}) \s09', s)
if m:
    print( [x[::-1] for x in re.findall(r'(?=\b(90.*?10$))', m.group()[::-1])] )
# => ['01 02 09 02 09 02 03 05 09 08 09', '01 02 09 02 09 02 03 05 09', '01 02 09 02 09', '01 02 09']

See the Python demo.

With the 01(?:\s\w{2}) \s09 pattern and re.search, you can extract the substrings from 01 to the last 09 (with any space separated two word char chunks in between).

The second step - [x[::-1] for x in re.findall(r'(?=\b(90.*?10$))', m.group()[::-1])] - is to reverse the string and the pattern to get all overlapping matches from 09 to 01 and then reverse them to get final strings.

You may also reverse the final list if you add [::-1] at the end of the list comprehension: print( [x[::-1] for x in re.findall(r'(?=\b(90.*?10$))', m.group()[::-1])][::-1] ).

CodePudding user response:

Here would be a non-regex answer that post-processes the matching elements:

s = '01 02 09 02 09 02 03 05 09 08 09 '.trim().split()
assert s[0] == '01'        \
   and s[-1] == '09'       \
   and (3 <= len(s) <= 11) \
   and len(s) == len([elem for elem in s if len(elem) == 2 and elem.isdigit() and elem[0] == '0'])
[s[:i 1] for i in sorted({s.index('09', i) for i in range(2,len(s))})]
# [
#    ['01', '02', '09'], 
#    ['01', '02', '09', '02', '09'], 
#    ['01', '02', '09', '02', '09', '02', '03', '05', '09'],
#    ['01', '02', '09', '02', '09', '02', '03', '05', '09', '08', '09']
# ]
  • Related