Home > Enterprise >  What would be the regex pattern for the following?
What would be the regex pattern for the following?

Time:07-22

I have multiple regex strings in format:- Example:

A='AB.224-QW-2018'

B='AB.876-5-LS-2018'

C='AB.26-LS-18'

D='AB-123-6-LS-2017'

E='IA-Mb-22L-AB.224-QW-2018-IA-Mb-22L'

F='ZX-ss-12L-AB-123-6-LS-2017-BC-22'

G='AB.224-2018'

H=''AB.224/QW/2018'

I=''AB/224/2018'

I want a regex pattern to get the output for the numbers that occur after the alphabets(that appear at the start) as well as the first alphabets. And at last years that are mentioned at last. There are some strings which contain 876-5,123-6 in B and D respectively. I don't want the single number that appear after -.

My code :

re.search(r"\D*\d*\D*(AB)\D*(\d )\D*(20)?(\d{2})\D*\d*\D*)

Another attempt


re.search(r"D*\d*\D*(AB)\D*(\d )\D*\d?\D*(20)?(\d{2})D*\d*\D*)

Both attempts will not work for all of them. Any pattern to match all strings?

I have created groups in regex pattern and extracted them as d.group(1) "/" d.group(2) "/" d.group(4). So output is expected as following if a regex pattern matches for all of them.

Expected Output

A='AB/224/18'

B='AB/876/18'

C='AB/26/18'

D='AB/123/17'



CodePudding user response:

Can't you just look for the last two digits, irrespective of dashes and "20" prefix? Like

(AB)[.-](\d ).*(\d\d)

I've tested in Sublime Text - works for me, it returns the same output you mentioned as desired.

CodePudding user response:

You could use 3 capture groups:

\b(AB)\D*(\d )\S*?(?:20)?(\d\d)\b
  • \b A word boundary to prevent a partial word match
  • (AB) Capture AB in group 1
  • \D* Match optional non digits
  • (\d ) Capture 1 digits in group 2
  • \S*? Optionally match non whitespace characters, as least as possible
  • (?:20)? Optionally match 20
  • (\d\d) Capture 2 digits in group 3
  • \b A word boundary

Regex demo

For example using re.finditer which returns Match objects that each hold the group values.

Using enumerate you can loop the matches. Every item in the iteration returns a tuple, where the first value is the count (that you don't need here) and the second value contains the Match object.

import re

pattern = r"\b(AB)\D*(\d )\S*?(?:20)?(\d\d)\b"

s = ("A='AB.224-QW-2018'\n"
            "B='AB.876-5-LS-2018'\n"
            "C='AB.26-LS-18'\n"
            "D='AB-123-6-LS-2017'\n"
            "IA-Mb-22L-AB.224-QW-2018-IA-Mb-22L' F='ZX-ss-12L-AB-123-6-LS-2017-BC-22\n"
            "A='AB.224-QW-2018'\n"
            "B='AB.876-5-LS-2018'\n"
            "C='AB.26-LS-18'\n"
            "D='AB-123-6-LS-2017'\n"
            "E='IA-Mb-22L-AB.224-QW-2018-IA-Mb-22L'\n"
            "F='ZX-ss-12L-AB-123-6-LS-2017-BC-22'\n"
            "G='AB.224-2018'\n"
            "H='AB.224/QW/2018'\n"
            "I='AB/224/2018'")

matches = re.finditer(pattern, s)

for _, m in enumerate(matches, start=1):
    print(m.group(1)   "/"   m.group(2)   "/"   m.group(3))

Output

AB/224/18
AB/876/18
AB/26/18
AB/123/17
AB/224/18
AB/123/17
AB/224/18
AB/876/18
AB/26/18
AB/123/17
AB/224/18
AB/123/17
AB/224/18
AB/224/18
AB/224/18
  • Related