How would you search for data between two parentheses on many new lines-CodePudding

Tried looking this up all over stack overflow with no dice. I have this test file

declare -a IP_LIST=(
  'x.x.x.x/24'
  'x.x.x.x/24'
  'x.x.x.x/24'
  'x.x.x.x/24'
  'x.x.x.x/24'
  'x.x.x.x/28'
)

I need to grab the IP's in the middle of this and throw to an array (rest of the file is just more bash code)

Tried using something like the following but it just returns none

list=re.search("IP_LIST=\(\n[\n.]*\)", out1)

CodePudding user response：

If you don't mind using multiple steps this could be a solution.

out1 = """declare -a IP_LIST=(
    'x.x.x.x/24'
    'x.x.x.x/24'
    'x.x.x.x/24'
    'x.x.x.x/24'
    'x.x.x.x/24'
    'x.x.x.x/28'
)"""

data = re.search(r"IP_LIST=\(((\s*'([x./0-9] )') )", out1) # , re.MULTILINE)
print(data.group(1))

You might want to remove the "x" in [x./0-9] for a real ip address.

This will give you

'x.x.x.x/24'
'x.x.x.x/24'
'x.x.x.x/24'
'x.x.x.x/24'
'x.x.x.x/24'
'x.x.x.x/28'

Based on that result we can continue with simple string operations

result = [ip.strip("' ") for ip in data.group(1).strip().split('\n')]
print(result)

The result now is ['x.x.x.x/24', 'x.x.x.x/24', 'x.x.x.x/24', 'x.x.x.x/24', 'x.x.x.x/24', 'x.x.x.x/28'].

If you want to remove the subnet too then change the line where you create the result to

result = [ip.split('/')[0].strip("' ") for ip in data.group(1).strip().split('\n')]

To explain the regular expression as requested in the comment:

We have IP_LIST=\(((\s*'([x./0-9] )') ). IP_LIST=\( matches "IP_LIST=(". This leaves us with ((\s*'([x./0-9] )') ). The outer parentheses are there to define the group that will contain the text we want to get later. Take those away and we have (\s*'([x./0-9] )') .

Let's first focus on \s*'([x./0-9] )'. \s* matches a bunch of potential whitespaces (space, tab, newline, ...). Then we have a literal '. The following group defines a set of characters: [x./0-9] (where 0-9 defines the numbers/characters from 0-9). As you can see all those characters are used in the ip addresses from your example (as I said, the "x" is due to the example data and might be dropped in the final code). The characters in this set can and will be repeated multiple times so we add the plus sign: [x./0-9] . The final character is again a literal '.

Now comes the fun part. We have just explained \s*'([x./0-9] )' which defines some whitespaces followed by an ip address. This might be repeated several times over multiple lines. So we wrap this in parentheses and add a to allow repetitions of this part. Now we're back to ((\s*'([x./0-9] )') ).

CodePudding user response：

Using Pypi regex module, you can make use of \G regex:

(?:^declare -a IP_LIST=\(|(?!^)\G)\s*'([^'] )'(?=[^()]*\))

RegEx Demo

Code:

import regex

rx = r"(?:^declare -a IP_LIST=\(|(?!^)\G)\s*'([^'] )'(?=[^()]*\))"

s = ("declare -a IP_LIST=(\n"
    "  'x.x.x.x/24'\n"
    "  'x.x.x.x/24'\n"
    "  'x.x.x.x/24'\n"
    "  'x.x.x.x/24'\n"
    "  'x.x.x.x/24'\n"
    "  'x.x.x.x/28'\n"
    ")")

print (regex.findall(rx, s))

Output:

['x.x.x.x/24', 'x.x.x.x/24', 'x.x.x.x/24', 'x.x.x.x/24', 'x.x.x.x/24', 'x.x.x.x/28']

RegEx Details:

(?:: Start non-capture group
- ^declare -a IP_LIST=\(: Match declare -a IP_LIST=( at the start of a line
- |: OR
- (?!^)\G: Start with the position from end of the previous match
): End non-capture group
\s*: Match optional whitespace characters
': Match opening '
([^'] ): Match text between quotes and capture it group #1
': Match closing '
(?=[^()]*\)): Positive lookahead to ensure that we have a closing ) ahead of the current position without matching ( or ) in between