Home > Software engineering >  Regex not matching letter followed by digits
Regex not matching letter followed by digits

Time:12-17

I am trying to match strings that starts with a letter followed by 2,6 digits anywhere in the line using the following regex ,the following regex matches R77 but not J123, can anyone provide guidance on how to fix this?

import re

code_free = "[KG6.R77.1.2][J123-P1A-00194]/12C114"
o = re.search(r'(^|[^a-zA-Z0-9:])([a-zA-Z](\d{2,6}[a-zA-Z]?|\d{1}[xX]{1,2}))([^a-zA-Z0-9]|AP|DEV|$)', code_free)

print (o.group(2))

CodePudding user response:

If I understand correctly, just use re.findall with the pattern \b[A-Z]\d{2,6}\b:

code_free = "[KG6.R77.1.2][J123-P1A-00194]/12C114"
codes = re.findall(r'\b[A-Z]\d{2,6}\b', code_free)
print(codes)  # ['R77', 'J123']

CodePudding user response:

Use with re.findall:

(?<![a-zA-Z0-9:])([a-zA-Z](?:\d{2,6}[a-zA-Z]?|\d[xX]{1,2}))(?=[^a-zA-Z0-9]|AP|DEV|$)

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  (?<!                     look behind to see if there is not:
--------------------------------------------------------------------------------
    [a-zA-Z0-9:]             any character of: 'a' to 'z', 'A' to
                             'Z', '0' to '9', ':'
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [a-zA-Z]                 any character of: 'a' to 'z', 'A' to 'Z'
--------------------------------------------------------------------------------
    (?:                      group, but do not capture:
--------------------------------------------------------------------------------
      \d{2,6}                  digits (0-9) (between 2 and 6 times
                               (matching the most amount possible))
--------------------------------------------------------------------------------
      [a-zA-Z]?                any character of: 'a' to 'z', 'A' to
                               'Z' (optional (matching the most
                               amount possible))
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      \d                       digits (0-9)
--------------------------------------------------------------------------------
      [xX]{1,2}                any character of: 'x', 'X' (between 1
                               and 2 times (matching the most amount
                               possible))
--------------------------------------------------------------------------------
    )                        end of grouping
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    [^a-zA-Z0-9]             any character except: 'a' to 'z', 'A' to
                             'Z', '0' to '9'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    AP                       'AP'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    DEV                      'DEV'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    $                        before an optional \n, and the end of
                             the string
--------------------------------------------------------------------------------
  )                        end of look-ahead
  • Related