Home > Net >  Why does this regex not match my groups in python?
Why does this regex not match my groups in python?

Time:08-12

I have the following complete code example

import re

examples = [
    "D1",       # expected: ('1')
    "D1sjdgf",  # ('1')
    "D1.2",     # ('1', '2')
    "D1.2.3",   # ('1', '2', '3')
    "D3.10.3x", # ('3', '10', '3')
    "D3.10.11"  # ('3', '10', '11')
]

for s in examples:
    result = re.search(r'^D(\d )(?:\.(\d )(?:\.(\d )))', s)
    print(s, result.groups())

where I want to match the 1, 2 or 3 numbers in the expression always starting with the letter "D". It could be 1 of them, or 2, or three. I am not interested in anything after the last digit.

I would expect that my regex would match e.g. D3.10.3x and return ('3','10','3'), but instead returns only ('3',). I do not understand why.

^D(\d \)(?:\.(\d )(?:\.(\d )))

  • ^D matches "D" at the start
  • \d matches the first one-digit number inside a group.
  • (?: starts a non-matching group. I do not want to get this group back.
  • \. A literal point
  • (\d ) A group of one or more numbers I want to "catch"

I also do not know what a "non-capturing" group means in that context as for this answer.

CodePudding user response:

You may use this regex solution with a start anchor and 2 capture groups inside the nested optional capture groups:

^D(\d )(?:\.(\d )(?:\.(\d ))?)?

RegEx Demo

Explanation:

  • ^: Start
  • D: Match letter D
  • (\d ): Match 1 digits in capture group #1
  • (?:: Start outer non-capture group
    • \.: Match a dot
    • (\d ): Match 1 digits in capture group #2
    • (?:: Start inner non-capture group
      • \.: Match a dot
      • (\d ): Match 1 digits in capture group #3
    • )?: End inner optional non-capture group
  • )?: End outer optional non-capture group

Code Demo:

import re

examples = [
    "D1",       # expected: ('1')
    "D1sjdgf",  # ('1')
    "D1.2",     # ('1', '2')
    "D1.2.3",   # ('1', '2', '3')
    "D3.10.3x", # ('3', '10', '3')
    "D3.10.11"  # ('3', '10', '11')
]

rx = re.compile(r'^D(\d )(?:\.(\d )(?:\.(\d ))?)?')

for s in examples:
    result = rx.search(s)
    print(s, result.groups())

Output:

D1 ('1', None, None)
D1sjdgf ('1', None, None)
D1.2 ('1', '2', None)
D1.2.3 ('1', '2', '3')
D3.10.3x ('3', '10', '3')
D3.10.11 ('3', '10', '11')
  • Related