Home > Software design >  conditional regex on multiline string in python
conditional regex on multiline string in python

Time:06-10

This question is similar to my original post.

Unable to use conditional regex to test my string in python

The reason for posting another new question is that the requirement here is a little different than the original one.

If the given string is a line by line based, the original answer is good enough. But, the answer there cannot cover the case on multiline string. See below

Test case Test string Expect value from bool(re.match(...))
1. Naive match
xxxx
xxxx
board add 0/1 aaa
board add 0/2 aaa
board add 0/3 bbb
board add 0/4 bbb
board add 0/5 aaa
#
True
2. Bad model name
xxxx
xxxx
board add 0/1 xxx
board add 0/2 aaa
board add 0/3 bbb
board add 0/4 aaa
board add 0/5 bbb
#
False
3. Missing model
xxxx
xxxx
board add 0/2 aaa
board add 0/3 bbb
board add 0/4 bbb
board add 0/5 aaa
#
True

I try multiple regex. But, all of them fail on either test case (2) / (3).

Tried Regex Failed on Test
(board add 0/1)? (?(1) (aaa|bbb)) 2
^(?:(?!board add 0/1).)*$|board add 0/1 (?:aaa|bbb) 2
board add 0/1 (aaa|bbb) 3
(?=board add 0/1 )(?:board add 0/1 (aaa|bbb)) 3

Is it possible to write a regex for getting above test case pass?

You can check them on following url

https://regex101.com/r/2l2Qd4/1

NOTE:

  • I just want to catch a particular board add 0/1 instead of board add 0/\d
    • In my actual use case, interfaces may need different models. That's why I am trying to figure out a particular regex for board add 0/1. Then, I can extend the regex to board add 0/2 to board add 0/21 one by one
  • Requirements of a valid string
    • If board add 0/1 exists in the string, it must be followed by (aaa|bbb). Otherwise, it is invalid
    • If board add 0/1 does not exists in the string, this is a valid string.

CodePudding user response:

In that case, you can use this regex

board add 0/\d  (?!aaa|bbb)

If the regex matches then the string is invalid.

Python Example

import re


strings = [
    """xxxx
xxxx
 board add 0/1 aaa
 board add 0/2 aaa
 board add 0/3 bbb
 board add 0/4 bbb
 board add 0/5 aaa
#""",
    """xxxx
xxxx
 board add 0/1 xxx
 board add 0/2 aaa
 board add 0/3 bbb
 board add 0/4 aaa
 board add 0/5 bbb
#""",
    """xxxx
xxxx
 board add 0/2 aaa
 board add 0/3 bbb
 board add 0/4 bbb
 board add 0/5 aaa
#"""
]

for string in strings:
    print(not bool(re.search(r"board add 0/\d  (?!aaa|bbb)", string)))

Output

True
False
True

Explanation

re.search returns the matched chunk of the string by the given pattern. If any matching does not exist returns None. The solution is based on negating the valid strings. So if neither aaa nor bbb is followed after board add 0/1 then the string is invalid. The rest are passed as you described in your previous question. So, if the re.search returns any value but None, then the not bool(...) will convert the value to the expected result.

NOTE: I'm using not bool(...) as the string is valid if it does not contain the pattern.

CodePudding user response:

You seem to want to match all board lines ending with either aaa or bbb, and being indented between strings that start the line with a non whitespace character.

To prevent partial matches, you would need to identify the part before and after the repeating board part.

^\S.*(?:\n[^\S\n] board add 0/\d  (?:aaa|bbb)) \n\S

Explanation

  • ^ Start of string
  • \S.* Match a non whitespace char and the rest of the line
  • (?: Non capture group to repeat as a whole part
    • \n[^\S\n] Match a newline followed by 1 spaces
    • board add 0/\d (?:aaa|bbb) Match the board pattern where \d matches 1 digits
  • ) Close the non capture group and repeat 1 times to match at least a single line
  • \n\S Match a newline and a non whitespace char

Regex demo

  • Related