How can I match a word using RE in the following format: Letter number Alphanumeric dot(.) Alphanumeric{0-4}
Examples:
A24.L
A2F.L9
A2F.LG4
This is what I've come up with so far:
answer=re.findall(r'[A-Za-z]\d\w\.\w{0-4})
CodePudding user response:
As you are using re.findall
, I assume you are looking for partial matches inside longer text. Bearing that in mind, you need to fix the following:
\w
matches not only alphanumeric, but also a_
char{0-4}
is not a valid limiting ("range", or "interval") quantifier, it has a{min,max}
syntax (note that themin
value should not be omitted, although some regex engines allow that with0
value used as default, but there are regex engines that either do not support or that do not work correctly with this omitting)- In Python 3,
\d
matches any Unicode digit (like٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९০১২৩৪৫৬৭৮৯੦੧੨੩੪੫੬੭੮੯૦૧૨૩૪૫૬૭૮૯୦୧୨୩୪୫୬୭୮୯௦௧௨௩௪௫௬௭௮௯౦౧౨౩౪౫౬౭౮౯೦೧೨೩೪೫೬೭೮೯൦൧൨൩൪൫൬൭൮൯๐๑๒๓๔๕๖๗๘๙໐໑໒໓໔໕໖໗໘໙༠༡༢༣༤༥༦༧༨༩၀၁၂၃၄၅၆၇၈၉႐႑႒႓႔႕႖႗႘႙០១២៣៤៥៦៧៨៩᠐᠑᠒᠓᠔᠕᠖᠗᠘᠙᥆᥇᥈᥉᥊᥋᥌᥍᥎᥏᧐᧑᧒᧓᧔᧕᧖᧗᧘᧙᭐᭑᭒᭓᭔᭕᭖᭗᭘᭙᮰᮱᮲᮳᮴᮵᮶᮷᮸᮹᱀᱁᱂᱃᱄᱅᱆᱇᱈᱉᱐᱑᱒᱓᱔᱕᱖᱗᱘᱙꘠꘡꘢꘣꘤꘥꘦꘧꘨꘩꣐꣑꣒꣓꣔꣕꣖꣗꣘꣙꤀꤁꤂꤃꤄꤅꤆꤇꤈꤉꩐꩑꩒꩓꩔꩕꩖꩗꩘꩙0123456789
), so you probably want to use(?a)
inline modifier (to only match ASCII digits) or an explicit[0-9]
.
So, you can use
answer=re.findall(r'\b[A-Za-z][0-9][A-Za-z0-9]\.[A-Za-z0-9]{1,4}\b', text)
if the alphanumeric after .
is obligatory, and the following if the match can end in a dot:
answer=re.findall(r'\b[A-Za-z][0-9][A-Za-z0-9]\.[A-Za-z0-9]{0,4}(?<!\w\B)', text)
Details:
\b
- word boundary[A-Za-z]
- a letter[0-9]
- an ASCII digit[A-Za-z0-9]
- an ASCII alphanumeric\.
- a.
char[A-Za-z0-9]{1,4}\b
- one to four alphanumeric chars at the word boundary.
The second regex does not contain a word boundary at the end since the match is supposed to be able to end in a .
(that is not a word char). The (?<!\w\B)
is a right-hand dynamic word boundary that only requires a non-word char or end position if the preceding char is a word char.
See the regex demo.
CodePudding user response:
The best way to solve these types of problems is via an online regex checker. You were very close. Only a slight modification is required.
Try:
[a-zA-Z][0-9]\w\.\w{0,4}