How to recursively expand nested braces as cartesian product-CodePudding

(I deliberately removed the "Python" tag someone added so as to keep this question language-agnostic.)

I think I managed to solve this iteratively by unwinding a stack for each brace, which is a solution I'm happy with (assuming it works). But I failed to implement a single function recursion. Can someone please help? What I tried turned out too complex for me to envision, with how different scenarios for appending rather than expanding are needed depending on the relationship between the commas and braces.

Is there an elegant recursion for something like this? I would also appreciate any recommendations for other types of established parsing/language methods I could study for this kind of problem.

Problem description: given a string of letters, valid curly braces, and commas; expand the braces so that the result includes all possibilities of the substring of letters immediately to the left of the braces concatenated with each of the comma-separated substrings of letters inside the braces. Braces can be nested.

Examples:

"abc{d,e}f{g,hi}" => ['abcdfg', 'abcdfhi', 'abcefg', 'abcefhi']


"abc{d{0,1},e}f{g{0{a,b},1,2},hi}" =>

['abcd0fg0a', 'abcd0fg0b', 'abcd0fg1', 'abcd0fg2', 'abcd0fhi',
 'abcd1fg0a', 'abcd1fg0b', 'abcd1fg1', 'abcd1fg2', 'abcd1fhi',
 'abcefg0a', 'abcefg0b', 'abcefg1', 'abcefg2', 'abcefhi']

(Hopefully working) iterative Python code:

# Brace expansion
# Example:
# input "abc{d,e}f{g,hi}"
# output ["abcdfg", "abcdfhi", "abcefg", "abcefhi"]
def f(s):
  stack = []

  i = 0

  while i < len(s):
    j = i

    while s[j] not in ['{', '}', ',']:
      j  = 1

    if s[i:j]:
      stack.append([s[i:j]])

    i = j

    if s[i] == '{':
      stack.append('{')
      i  = 1

    if s[i] == ',':
      i  = 1

    # Unwind stack
    if s[i] == '}':
      # Get comma, separated elements
      lst = []
      while stack[-1] != '{':
        lst.extend(reversed(stack.pop()))
      lst = reversed(lst)
      stack.pop() # Remove '{'
      pfxs = reversed(stack.pop())
      stack.append([pfx   sfx for pfx in pfxs for sfx in lst])
      i  = 1

  # Cartesian product of stack
  result = []

  for i in range(1, len(stack)):
    result = [pfx   sfx for pfx in stack[i-1] for sfx in stack[i]]
    
  return result

Output:

s = "abc{d,e}f{g,hi}"

print(s)
print(f(s))

s = "abc{d{0,1},e}f{g{0{a,b},1,2},hi}"

print("")
print(s)
print(f(s))

"""
abc{d,e}f{g,hi}
['abcdfg', 'abcdfhi', 'abcefg', 'abcefhi']

abc{d{0,1},e}f{g{0{a,b},1,2},hi}
['abcd0fg0a', 'abcd0fg0b', 'abcd0fg1', 'abcd0fg2', 'abcd0fhi',
 'abcd1fg0a', 'abcd1fg0b', 'abcd1fg1', 'abcd1fg2', 'abcd1fhi',
 'abcefg0a', 'abcefg0b', 'abcefg1', 'abcefg2', 'abcefhi']
"""

CodePudding user response：

Forethought

I read "cartesian product" so of course my first reflex is to import the itertools module.

Secondly, due to the nested nature of braces, I am going to use recursion.

Thirdly, there are two things we can split the string on: commas and braces. So I am going to write two mutually recursive functions: one to expand on commas, and one to expand on braces.

Function expand_commas will attempt to split the string on commas. Function expand_braces will assume that the string does not contain commas at the upper level (that is, it could have commas hidden inside braces, but no visible commas outside of braces).

Fourthly, class str has a method .split which might have been convenient to split the strings on , and on { and }; but In can't use it, because I want the split to ignore nested characters. So I will have to write my own functions get_commas and get_braces to detect only non-nested commas and braces.

Python code

import itertools

# returns a pair of indices:
#   the index of the first '{' in the string;
#   the index of its matching '}'
def get_braces(s):
  i = s.index('{')
  j = i   1
  depth = 0
  while j < len(s) and (depth > 0 or s[j] != '}'):
    if s[j] == '{':
      depth  = 1
    elif s[j] == '}':
      depth -= 1
    j  = 1
  return i,j

# returns a list of indices:
#   the indices of every ',' in the string
#   except ',' that are hidden inside braces
def get_commas(s):
  depth = 0
  commas = []
  for i,c in enumerate(s):
    if c == '{':
      depth  = 1
    elif c == '}':
      depth -= 1
    elif depth == 0 and c == ',':
      commas.append(i)
  return commas

# splits on commas
# make recursive calls on substrings
# returns list of alternatives
def expand_commas(s):
  commas = get_commas(s)
  if commas:
    return list(itertools.chain.from_iterable(expand_braces(s[i 1:j]) for i,j in zip([-1] commas, commas [len(s)])))
  else:
    return expand_braces(s)

# assumes no commas outside of braces
# splits on first '{' and its matching '}'
# make recursive calls, return cartesian product
def expand_braces(s):
  if '{' not in s:
    return [s]
  else:
    i,j = get_braces(s)
    infixes = expand_commas(s[i 1:j])
    suffixes = expand_braces(s[j 1:])
    return list(''.join(p) for p in itertools.product([s[:i]], infixes, suffixes))

Tests

# balanced string
print(expand_commas("abc{d{0,1},e}f{g{0{a,b},1,2},hi}"))
# ['abcd0fg0a', 'abcd0fg0b', 'abcd0fg1', 'abcd0fg2', 'abcd0fhi',
#  'abcd1fg0a', 'abcd1fg0b', 'abcd1fg1', 'abcd1fg2', 'abcd1fhi',
#  'abcefg0a', 'abcefg0b', 'abcefg1', 'abcefg2', 'abcefhi']

# only commas
print(expand_commas("a,b,c"))
# ['a', 'b', 'c']

# mismatched braces: missing closing brace
print(expand_commas("abc{d{0,1},e}f{g{0{a,b"))
# ['abcd0fg0a', 'abcd0fg0b', 'abcd1fg0a', 'abcd1fg0b', 'abcefg0a', 'abcefg0b']

print(expand_commas("abc{d{0,1},e}f{g{0{a,b}}}"))
# ['abcd0fg0a', 'abcd0fg0b', 'abcd1fg0a', 'abcd1fg0b', 'abcefg0a', 'abcefg0b']

# mismatched braces: extra closing brace
print(expand_commas("abc{d,e}}{f,g}}"))
# ['abcd}f', 'abce}f', 'g}}']

# empty braces
print(expand_commas("abc{}d"))
# ['abcd']

# empty commas
print(expand_commas("abc{,,,}d"))
# ['abcd', 'abcd', 'abcd', 'abcd']

CodePudding user response：

To differentiate between the case where recursion is initiated by a comma or by a brace, I would suggest to pass as argument the prefixes that should be prefixed to whatever the recursive call will process.

I would also tokenize the input using a regular expression.

Here is how that would look:

import re
from itertools import product 

def expand(s):
    tokens = re.finditer(r"[^{},] |.|$", s)
    
    def recur(prefixes):
        token = next(tokens).group(0) or "}"
        if token == "}":
            return prefixes
        elif token == ",":
            return prefixes   recur([""])
        elif token == "{":
            return recur([a   b for a, b in product(prefixes, recur([""]))])
        else:
            return recur([prefix   token for prefix in prefixes])

    return recur([""])

Example call:

s = "abc{d{0,1},e}f{g{0{a,b},1,2},hi}"

print(expand(s))

Output:

['abcd0fg0a', 'abcd0fg0b', 'abcd0fg1', 'abcd0fg2', 'abcd0fhi', 
 'abcd1fg0a', 'abcd1fg0b', 'abcd1fg1', 'abcd1fg2', 'abcd1fhi', 
 'abcefg0a',  'abcefg0b',  'abcefg1',  'abcefg2',  'abcefhi']

Version with input validation

If the solution should raise a controlled exception when the braces are not balanced, then extend the function to this:

def expand(s):
    tokens = re.finditer(r"[^{},] |.|$", s)
    
    def recur(prefixes, until):
        token = next(tokens).group(0)
        if token == until:
            return prefixes
        elif token == ",":
            return prefixes   recur([""], until)
        elif token == "{":
            return recur([a   b for a, b in product(prefixes, recur([""], "}"))], until)
        elif token == "}":
            raise ValueError("Unexpected closing brace")
        elif token:
            return recur([prefix   token for prefix in prefixes], until)
        else:
            raise ValueError("Missing closing brace")
    
    return recur([""], "")

CodePudding user response：

Thanks to trincot's brilliant insights, I was able to come up with the single recursive function I was after.

Python code:

# Returns expanded braces and the
# ending index for the processed section.
def f(s, prefixes=[''], i=0):
  if i == len(s):
    return prefixes, i

  j = i

  while j < len(s) and s[j] not in ['{', '}', ',']:
    j  = 1

  new_prefixes = [pfx   s[i:j] for pfx in prefixes]

  if j == len(s):
    return new_prefixes, j
  
  if s[j] == '}':
    return new_prefixes, j   1

  if s[j] == '{':
    suffixes, next_idx = f(s, [''], j   1)
    expanded = [pfx   sfx for pfx in new_prefixes for sfx in suffixes]
    return f(s, expanded, next_idx)

  if s[j] == ',':
    words, next_idx = f(s, [''], j   1)
    return new_prefixes   words, next_idx