I have specific patterns which composed of string, numbers and special character in specific order. I would like to check input string is in the list of pattern that I created and print error if seeing incorrect input. To do so, I tried of using regex
but my code is not neat enough. I am wondering if someone help me with this.
use case
I have input att2_epic_app_clm1_sub_valid
, where I split them by _
; here is list of pattern I am expecting to check and print error if not match.
Rule:
input should start with att and some number like [att][0-6]*
, or [ptt][0-6]
; after that it should be continued at either epic
or semi
, then it should be continued with [app][0-6]
or [app][0-6_][clm][0-9_] [sub|sup]
; then it should end with [valid|Invalid]
so I composed this pattern with re
but when I passed invalid input, it is not detected and I expect error instead.
import re
acceptable_pattern=re.compile(r'([att] [0-6_])(epic|semi_)([app] [0-6_] [clm] [0-6_])([sub|sup_])([valid|invalid]))'
input='att1_epic_app2_clm1_sub_valid' # this is valid string
wlist=input.split('_')
for each in wlist:
if any(ext in each for ext in acceptable_pattern):
print("valid")
else:
print("invalid")
this is not quite working because I have to check the string from beginning to end where split the string by _
where each new string much match of of the predefined rule such as:
input string should start with att|ptt which end with between 1-6; then next new word either epic or semi; then it should be app or app1~app6 or app{1_6}clm{1~6}{sub|sup_}; then string end with {valid|invalid};
how should I specify those rules by using re.compile to check pattern in input string and raise error if it is not sequentially? How should we do this in python? any quick way of making this happen?
CodePudding user response:
Instead of using split, you could consider writing a pattern that validates the whole string.
If I am reading the requirements, you might use:
^[ap]tt[0-6]_(?:epic|semi)_app(?:[1-6]|[1-6_]clm[0-9]*_su[bp])?_valid$
^
Start of string[ap]tt[0-6]
matchatt
orptt
and a digit 0-6_(?:epic|semi)
Match_epic
or_semi
_app
Match literally(?:
Non capture group for the alternation[1-6]
Match a digit 1-6|
Or[1-6_]clm[0-9]*_su[bp]
Match a digit 1-6 or_
, thenclm
followed by optional digit 0-9 and then_sub
or_sup
)?
Close the non capture group and make it optional_valid
Match literally$
End of string
See a regex demo.
If the string can also start with dev then you can use an alternation:
^(?:[ap]tt|dev)[0-6]_(?:epic|semi)_app(?:[1-6]|[1-6_]clm[0-9]*_su[bp])?_valid$
See another regex demo.
Then you can check if there was a match:
import re
pattern = r"^(?:[ap]tt|dev)[0-6]_(?:epic|semi)_app(?:[1-6]|[1-6_]clm[0-9]*_su[bp])?_valid$"
strings = [
"att2_epic_app_clm1_sub_valid",
"att12_epic_app_clm1_sub_valid",
"att2_epic_app_valid",
"att2_epic_app_clm1_sub_valid"
]
for s in strings:
m = re.match(pattern, s, re.M)
if m:
print("Valid: " m.group())
else:
print("Invalid: " s)
Output
Valid: att2_epic_app_clm1_sub_valid
Invalid: att12_epic_app_clm1_sub_valid
Valid: att2_epic_app_valid
Valid: att2_epic_app_clm1_sub_valid