I have a sequence like such '01 02 09 02 09 02 03 05 09 08 09 '
, and I want to find a sequence that starts with 01
and ends with 09
, and in-between there can be one to nine double-digit, such as 02
, 03
, 04
etc. This is what I have tried so far.
I'm using w{2}\s
(w{2}
for matching the two digits, and \s
for the whitespace). This can occur one to nine times, which leads to (\w{2}\s){1,9}
. The whole regex becomes
(01\s(\w{2}\s){1,9}09\s)
. This returns the following result:
<regex.Match object; span=(0, 33), match='01 02 09 02 09 02 03 05 09 08 09 '>
If I use the lazy quantifier ?
, it returns the following result:
<regex.Match object; span=(0, 9), match='01 02 09 '>
How can I obtain the results in-between too. The desired result would include all the following:
<regex.Match object; span=(0, 9), match='01 02 09 '>
<regex.Match object; span=(0, 15), match='01 02 09 02 09 '>
<regex.Match object; span=(0, 27), match='01 02 09 02 09 02 03 05 09 '>
<regex.Match object; span=(0, 33), match='01 02 09 02 09 02 03 05 09 08 09 '>
CodePudding user response:
You can extract these strings using
import re
s = "01 02 09 02 09 02 03 05 09 08 09 "
m = re.search(r'01(?:\s\w{2}) \s09', s)
if m:
print( [x[::-1] for x in re.findall(r'(?=\b(90.*?10$))', m.group()[::-1])] )
# => ['01 02 09 02 09 02 03 05 09 08 09', '01 02 09 02 09 02 03 05 09', '01 02 09 02 09', '01 02 09']
See the Python demo.
With the 01(?:\s\w{2}) \s09
pattern and re.search
, you can extract the substrings from 01
to the last 09
(with any space separated two word char chunks in between).
The second step - [x[::-1] for x in re.findall(r'(?=\b(90.*?10$))', m.group()[::-1])]
- is to reverse the string and the pattern to get all overlapping matches from 09
to 01
and then reverse them to get final strings.
You may also reverse the final list if you add [::-1]
at the end of the list comprehension: print( [x[::-1] for x in re.findall(r'(?=\b(90.*?10$))', m.group()[::-1])][::-1] )
.
CodePudding user response:
Here would be a non-regex answer that post-processes the matching elements:
s = '01 02 09 02 09 02 03 05 09 08 09 '.trim().split()
assert s[0] == '01' \
and s[-1] == '09' \
and (3 <= len(s) <= 11) \
and len(s) == len([elem for elem in s if len(elem) == 2 and elem.isdigit() and elem[0] == '0'])
[s[:i 1] for i in sorted({s.index('09', i) for i in range(2,len(s))})]
# [
# ['01', '02', '09'],
# ['01', '02', '09', '02', '09'],
# ['01', '02', '09', '02', '09', '02', '03', '05', '09'],
# ['01', '02', '09', '02', '09', '02', '03', '05', '09', '08', '09']
# ]