(I deliberately removed the "Python" tag someone added so as to keep this question language-agnostic.)
I think I managed to solve this iteratively by unwinding a stack for each brace, which is a solution I'm happy with (assuming it works). But I failed to implement a single function recursion. Can someone please help? What I tried turned out too complex for me to envision, with how different scenarios for appending rather than expanding are needed depending on the relationship between the commas and braces.
Is there an elegant recursion for something like this? I would also appreciate any recommendations for other types of established parsing/language methods I could study for this kind of problem.
Problem description: given a string of letters, valid curly braces, and commas; expand the braces so that the result includes all possibilities of the substring of letters immediately to the left of the braces concatenated with each of the comma-separated substrings of letters inside the braces. Braces can be nested.
Examples:
"abc{d,e}f{g,hi}" => ['abcdfg', 'abcdfhi', 'abcefg', 'abcefhi']
"abc{d{0,1},e}f{g{0{a,b},1,2},hi}" =>
['abcd0fg0a', 'abcd0fg0b', 'abcd0fg1', 'abcd0fg2', 'abcd0fhi',
'abcd1fg0a', 'abcd1fg0b', 'abcd1fg1', 'abcd1fg2', 'abcd1fhi',
'abcefg0a', 'abcefg0b', 'abcefg1', 'abcefg2', 'abcefhi']
(Hopefully working) iterative Python code:
# Brace expansion
# Example:
# input "abc{d,e}f{g,hi}"
# output ["abcdfg", "abcdfhi", "abcefg", "abcefhi"]
def f(s):
stack = []
i = 0
while i < len(s):
j = i
while s[j] not in ['{', '}', ',']:
j = 1
if s[i:j]:
stack.append([s[i:j]])
i = j
if s[i] == '{':
stack.append('{')
i = 1
if s[i] == ',':
i = 1
# Unwind stack
if s[i] == '}':
# Get comma, separated elements
lst = []
while stack[-1] != '{':
lst.extend(reversed(stack.pop()))
lst = reversed(lst)
stack.pop() # Remove '{'
pfxs = reversed(stack.pop())
stack.append([pfx sfx for pfx in pfxs for sfx in lst])
i = 1
# Cartesian product of stack
result = []
for i in range(1, len(stack)):
result = [pfx sfx for pfx in stack[i-1] for sfx in stack[i]]
return result
Output:
s = "abc{d,e}f{g,hi}"
print(s)
print(f(s))
s = "abc{d{0,1},e}f{g{0{a,b},1,2},hi}"
print("")
print(s)
print(f(s))
"""
abc{d,e}f{g,hi}
['abcdfg', 'abcdfhi', 'abcefg', 'abcefhi']
abc{d{0,1},e}f{g{0{a,b},1,2},hi}
['abcd0fg0a', 'abcd0fg0b', 'abcd0fg1', 'abcd0fg2', 'abcd0fhi',
'abcd1fg0a', 'abcd1fg0b', 'abcd1fg1', 'abcd1fg2', 'abcd1fhi',
'abcefg0a', 'abcefg0b', 'abcefg1', 'abcefg2', 'abcefhi']
"""
CodePudding user response:
Forethought
I read "cartesian product" so of course my first reflex is to import the itertools module.
Secondly, due to the nested nature of braces, I am going to use recursion.
Thirdly, there are two things we can split the string on: commas and braces. So I am going to write two mutually recursive functions: one to expand on commas, and one to expand on braces.
Function expand_commas
will attempt to split the string on commas. Function expand_braces
will assume that the string does not contain commas at the upper level (that is, it could have commas hidden inside braces, but no visible commas outside of braces).
Fourthly, class str
has a method .split
which might have been convenient to split the strings on ,
and on {
and }
; but In can't use it, because I want the split to ignore nested characters. So I will have to write my own functions get_commas
and get_braces
to detect only non-nested commas and braces.
Python code
import itertools
# returns a pair of indices:
# the index of the first '{' in the string;
# the index of its matching '}'
def get_braces(s):
i = s.index('{')
j = i 1
depth = 0
while j < len(s) and (depth > 0 or s[j] != '}'):
if s[j] == '{':
depth = 1
elif s[j] == '}':
depth -= 1
j = 1
return i,j
# returns a list of indices:
# the indices of every ',' in the string
# except ',' that are hidden inside braces
def get_commas(s):
depth = 0
commas = []
for i,c in enumerate(s):
if c == '{':
depth = 1
elif c == '}':
depth -= 1
elif depth == 0 and c == ',':
commas.append(i)
return commas
# splits on commas
# make recursive calls on substrings
# returns list of alternatives
def expand_commas(s):
commas = get_commas(s)
if commas:
return list(itertools.chain.from_iterable(expand_braces(s[i 1:j]) for i,j in zip([-1] commas, commas [len(s)])))
else:
return expand_braces(s)
# assumes no commas outside of braces
# splits on first '{' and its matching '}'
# make recursive calls, return cartesian product
def expand_braces(s):
if '{' not in s:
return [s]
else:
i,j = get_braces(s)
infixes = expand_commas(s[i 1:j])
suffixes = expand_braces(s[j 1:])
return list(''.join(p) for p in itertools.product([s[:i]], infixes, suffixes))
Tests
# balanced string
print(expand_commas("abc{d{0,1},e}f{g{0{a,b},1,2},hi}"))
# ['abcd0fg0a', 'abcd0fg0b', 'abcd0fg1', 'abcd0fg2', 'abcd0fhi',
# 'abcd1fg0a', 'abcd1fg0b', 'abcd1fg1', 'abcd1fg2', 'abcd1fhi',
# 'abcefg0a', 'abcefg0b', 'abcefg1', 'abcefg2', 'abcefhi']
# only commas
print(expand_commas("a,b,c"))
# ['a', 'b', 'c']
# mismatched braces: missing closing brace
print(expand_commas("abc{d{0,1},e}f{g{0{a,b"))
# ['abcd0fg0a', 'abcd0fg0b', 'abcd1fg0a', 'abcd1fg0b', 'abcefg0a', 'abcefg0b']
print(expand_commas("abc{d{0,1},e}f{g{0{a,b}}}"))
# ['abcd0fg0a', 'abcd0fg0b', 'abcd1fg0a', 'abcd1fg0b', 'abcefg0a', 'abcefg0b']
# mismatched braces: extra closing brace
print(expand_commas("abc{d,e}}{f,g}}"))
# ['abcd}f', 'abce}f', 'g}}']
# empty braces
print(expand_commas("abc{}d"))
# ['abcd']
# empty commas
print(expand_commas("abc{,,,}d"))
# ['abcd', 'abcd', 'abcd', 'abcd']
CodePudding user response:
To differentiate between the case where recursion is initiated by a comma or by a brace, I would suggest to pass as argument the prefixes that should be prefixed to whatever the recursive call will process.
I would also tokenize the input using a regular expression.
Here is how that would look:
import re
from itertools import product
def expand(s):
tokens = re.finditer(r"[^{},] |.|$", s)
def recur(prefixes):
token = next(tokens).group(0) or "}"
if token == "}":
return prefixes
elif token == ",":
return prefixes recur([""])
elif token == "{":
return recur([a b for a, b in product(prefixes, recur([""]))])
else:
return recur([prefix token for prefix in prefixes])
return recur([""])
Example call:
s = "abc{d{0,1},e}f{g{0{a,b},1,2},hi}"
print(expand(s))
Output:
['abcd0fg0a', 'abcd0fg0b', 'abcd0fg1', 'abcd0fg2', 'abcd0fhi',
'abcd1fg0a', 'abcd1fg0b', 'abcd1fg1', 'abcd1fg2', 'abcd1fhi',
'abcefg0a', 'abcefg0b', 'abcefg1', 'abcefg2', 'abcefhi']
Version with input validation
If the solution should raise a controlled exception when the braces are not balanced, then extend the function to this:
def expand(s):
tokens = re.finditer(r"[^{},] |.|$", s)
def recur(prefixes, until):
token = next(tokens).group(0)
if token == until:
return prefixes
elif token == ",":
return prefixes recur([""], until)
elif token == "{":
return recur([a b for a, b in product(prefixes, recur([""], "}"))], until)
elif token == "}":
raise ValueError("Unexpected closing brace")
elif token:
return recur([prefix token for prefix in prefixes], until)
else:
raise ValueError("Missing closing brace")
return recur([""], "")
CodePudding user response:
Thanks to trincot's brilliant insights, I was able to come up with the single recursive function I was after.
Python code:
# Returns expanded braces and the
# ending index for the processed section.
def f(s, prefixes=[''], i=0):
if i == len(s):
return prefixes, i
j = i
while j < len(s) and s[j] not in ['{', '}', ',']:
j = 1
new_prefixes = [pfx s[i:j] for pfx in prefixes]
if j == len(s):
return new_prefixes, j
if s[j] == '}':
return new_prefixes, j 1
if s[j] == '{':
suffixes, next_idx = f(s, [''], j 1)
expanded = [pfx sfx for pfx in new_prefixes for sfx in suffixes]
return f(s, expanded, next_idx)
if s[j] == ',':
words, next_idx = f(s, [''], j 1)
return new_prefixes words, next_idx