Python - improve regex expression-CodePudding

I built my own regex exp

r'(\d [x]\d [-._](\w ))|(\d [x]\d \w )'

alphanumeric 1x01-e02-03-04
hello-char 2x01-02-03_04
hello 3x02 char 2x01-02-03_04

I have to grab the sub-strings '1x01' and 'e02', '03', '04' or '2x01','02' etc..

String length is variable, for example:

alphanumeric 1x01-e02-03-04

alphanumeric 1x01-e02

The first sub-string is always "nnnxnnn" where n is an integer ( max three digit) and the char 'x' is always present in string. The 'e' char is the only letter after 'x' but it's not always present for example 'e02' and '03', but I need both integer.

Is it possible to improve it?

CodePudding user response：

You can use

import re

rx = re.compile(r'\b\d x\d (?:[-_]e?\d )*')

texts = ['alphanumeric 1x01-e02-03-04',
'hello-char 2x01-02-03_04',
'hello 3x02 char 2x01-02-03_04']

for text in texts:
    print([re.split(r'[-_]e?', x) for x in rx.findall(text)])

See the Python demo and the regex demo. Output:

[['1x01', '02', '03', '04']]
[['2x01', '02', '03', '04']]
[['3x02'], ['2x01', '02', '03', '04']]

Regex details:

\b - word boundary
\d x\d - one or more digits, x, one or more digits
(?:[-_]e?\d )* - zero or more repetitions of - or _ and then an optional e and then one or more digits.

After you get each match, you need to split with _ or - (the separators), hence the use of re.split(r'[-_]e?', x) (it matches - or _ and then an optional e.

CodePudding user response：

This is very similar to this question in regards to finding Regex patterns to filter out certain items such as 1x01 or 101, etc...

As a side-note, I highly recommend regexr.com as a space to test out these Regex patterns.