This question is similar to my original post.
Unable to use conditional regex to test my string in python
The reason for posting another new question is that the requirement here is a little different than the original one.
If the given string is a line by line based, the original answer is good enough. But, the answer there cannot cover the case on multiline string. See below
Test case | Test string | Expect value from bool(re.match(...)) |
---|---|---|
1. Naive match | xxxx |
True |
2. Bad model name | xxxx |
False |
3. Missing model | xxxx |
True |
I try multiple regex. But, all of them fail on either test case (2) / (3).
Tried Regex | Failed on Test |
---|---|
(board add 0/1)? (?(1) (aaa|bbb)) |
2 |
^(?:(?!board add 0/1).)*$|board add 0/1 (?:aaa|bbb) |
2 |
board add 0/1 (aaa|bbb) |
3 |
(?=board add 0/1 )(?:board add 0/1 (aaa|bbb)) |
3 |
Is it possible to write a regex for getting above test case pass?
You can check them on following url
https://regex101.com/r/2l2Qd4/1
NOTE:
- I just want to catch a particular
board add 0/1
instead ofboard add 0/\d
- In my actual use case, interfaces may need different models. That's why I am trying to figure out a particular regex for
board add 0/1
. Then, I can extend the regex toboard add 0/2
toboard add 0/21
one by one
- In my actual use case, interfaces may need different models. That's why I am trying to figure out a particular regex for
- Requirements of a valid string
- If
board add 0/1
exists in the string, it must be followed by(aaa|bbb)
. Otherwise, it is invalid - If
board add 0/1
does not exists in the string, this is a valid string.
- If
CodePudding user response:
In that case, you can use this regex
board add 0/\d (?!aaa|bbb)
If the regex matches then the string is invalid.
Python Example
import re
strings = [
"""xxxx
xxxx
board add 0/1 aaa
board add 0/2 aaa
board add 0/3 bbb
board add 0/4 bbb
board add 0/5 aaa
#""",
"""xxxx
xxxx
board add 0/1 xxx
board add 0/2 aaa
board add 0/3 bbb
board add 0/4 aaa
board add 0/5 bbb
#""",
"""xxxx
xxxx
board add 0/2 aaa
board add 0/3 bbb
board add 0/4 bbb
board add 0/5 aaa
#"""
]
for string in strings:
print(not bool(re.search(r"board add 0/\d (?!aaa|bbb)", string)))
Output
True
False
True
Explanation
re.search
returns the matched chunk of the string by the given pattern. If any matching does not exist returns None
. The solution is based on negating the valid strings. So if neither aaa
nor bbb
is followed after board add 0/1
then the string is invalid. The rest are passed as you described in your previous question. So, if the re.search
returns any value but None
, then the not bool(...)
will convert the value to the expected result.
NOTE: I'm using not bool(...)
as the string is valid if it does not contain the pattern.
CodePudding user response:
You seem to want to match all board lines ending with either aaa or bbb, and being indented between strings that start the line with a non whitespace character.
To prevent partial matches, you would need to identify the part before and after the repeating board part.
^\S.*(?:\n[^\S\n] board add 0/\d (?:aaa|bbb)) \n\S
Explanation
^
Start of string\S.*
Match a non whitespace char and the rest of the line(?:
Non capture group to repeat as a whole part\n[^\S\n]
Match a newline followed by 1 spacesboard add 0/\d (?:aaa|bbb)
Match the board pattern where\d
matches 1 digits
)
Close the non capture group and repeat 1 times to match at least a single line\n\S
Match a newline and a non whitespace char