I am trying to match strings that starts with a letter followed by 2,6 digits anywhere in the line using the following regex ,the following regex matches R77 but not J123, can anyone provide guidance on how to fix this?
import re
code_free = "[KG6.R77.1.2][J123-P1A-00194]/12C114"
o = re.search(r'(^|[^a-zA-Z0-9:])([a-zA-Z](\d{2,6}[a-zA-Z]?|\d{1}[xX]{1,2}))([^a-zA-Z0-9]|AP|DEV|$)', code_free)
print (o.group(2))
CodePudding user response:
If I understand correctly, just use re.findall
with the pattern \b[A-Z]\d{2,6}\b
:
code_free = "[KG6.R77.1.2][J123-P1A-00194]/12C114"
codes = re.findall(r'\b[A-Z]\d{2,6}\b', code_free)
print(codes) # ['R77', 'J123']
CodePudding user response:
Use with re.findall
:
(?<![a-zA-Z0-9:])([a-zA-Z](?:\d{2,6}[a-zA-Z]?|\d[xX]{1,2}))(?=[^a-zA-Z0-9]|AP|DEV|$)
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
[a-zA-Z0-9:] any character of: 'a' to 'z', 'A' to
'Z', '0' to '9', ':'
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[a-zA-Z] any character of: 'a' to 'z', 'A' to 'Z'
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
\d{2,6} digits (0-9) (between 2 and 6 times
(matching the most amount possible))
--------------------------------------------------------------------------------
[a-zA-Z]? any character of: 'a' to 'z', 'A' to
'Z' (optional (matching the most
amount possible))
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
[xX]{1,2} any character of: 'x', 'X' (between 1
and 2 times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
[^a-zA-Z0-9] any character except: 'a' to 'z', 'A' to
'Z', '0' to '9'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
AP 'AP'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
DEV 'DEV'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
) end of look-ahead