Home > database >  Find a closing brace pair, skipping internal open/close pairs, in Python
Find a closing brace pair, skipping internal open/close pairs, in Python

Time:10-23

I have a string that starts "{{ABC..." and which will also contain a closing brace pair. The problem is that it may also contain nested open/close brace pairs, and may contain further brace pairs after the matching closing pair. E.g. this is possible:

    {{ABC foo bar {{baz}} {{fred}} foo2}} other text {{other brace pair}}

In this case I would want the string up to "foo2}}. I can see how to do this by writing my own recursive function call, but is there a way to match this in a single pass?

CodePudding user response:

You can find all enclosed substrings by scanning the input string only once.

The only thing you need is to record the number of left braces you have met. Increase it when you see left brace and decrease it when seeing right brace. When it decreases to 0 you get an enclosed substring.

def find_enclosed_string(string):
    left_brace_cnt = 0
    enclosed_list = []
    enclosed_str_range = [0, 0]
    for i, s in enumerate(string):
        if s == "{":
            if left_brace_cnt == 0:
                enclosed_str_range[0] = i
            left_brace_cnt  = 1
        elif s == "}":
            left_brace_cnt -= 1
            if left_brace_cnt == 0:
                enclosed_str_range[1] = i
        if enclosed_str_range[1] > enclosed_str_range[0]:
            enclosed_list.append(string[enclosed_str_range[0]:enclosed_str_range[1] 1])
            enclosed_str_range = [0, 0]
    return enclosed_list

string = "{{ABC foo bar {{baz}} {{fred}} foo2}} other text {{other brace pair}}"

find_enclosed_string(string)

# ['{{ABC foo bar {{baz}} {{fred}} foo2}}', '{{other brace pair}}']

CodePudding user response:

PyPI Regex supports recursion. To target {{ABC use a lookahead followed by a group that contains the recursed pattern. At (?1) the pattern contained in the first group gets pasted (read more).

(?={{ABC)({(?>[^}{] |(?1))*})

See this demo at regex101 or a Python demo at tio.run


The (?> atomic group ) prevents running into catastrophic backtracking on unbalanced braces.

  • Related