I am trying to create a function to parse a string based on multiple delimiters, but in a hierarchical format: i.e., try the first delimiter, then the second, then the third, etc.
This question seemingly provides a solution, specifically linking this comment.
# Split the team names, with a hierarchical delimiter
def split_new(inp, delims=['VS', '/ ' ,'/']):
# https://stackoverflow.com/questions/67574893/python-split-string-by-multiple-delimiters-following-a-hierarchy
for d in delims:
result = inp.split(d, maxsplit=1)
if len(result) == 2:
return result
else:
return [inp] # If nothing worked, return the input
test_strs = ['STACK/ OVERFLOW', 'STACK #11/00 VS OVERFLOW', 'STACK/OVERFLOW' ]
for ts in test_strs:
res = split_new(ts)
print(res)
"""
Output:
['STACK/ OVERFLOW']
['STACK #11/00 ', ' OVERFLOW']
['STACK/OVERFLOW']
Expected:
['STACK',' OVERFLOW']
['STACK #11/00 ', ' OVERFLOW']
['STACK', 'OVERFLOW']
"""
However, my results are not as expected. What am I missing?
CodePudding user response:
Execute the "nothing worked" fallback AFTER trying all delimiters:
for d in delims:
result = inp.split(d, maxsplit=1)
if len(result) == 2:
return result
return [inp] # If nothing worked, return the input
CodePudding user response:
As an alternative, instead of looping the delimiters, you might use a single pattern with an alternation |
import re
test_strs = ['STACK/ OVERFLOW', 'STACK #11/00 VS OVERFLOW', 'STACK/OVERFLOW' ]
pattern = r"/(?!\d)|VS"
for s in test_strs:
print(re.split(pattern, s))
Output
['STACK', ' OVERFLOW']
['STACK #11/00 ', ' OVERFLOW']
['STACK', 'OVERFLOW']
CodePudding user response:
This is because you try to return result on first iteration of loop when there is split for 'VS' you return result using else statement
right way of doing it is:
def split_new(inp, delims=['VS', '/ ' ,'/']):
for d in delims:
result = inp.split(d, maxsplit=1)
if len(result) == 2:
return result
return [inp] # If nothing worked, return the input
test_strs = ['STACK/ OVERFLOW', 'STACK #11/00 VS OVERFLOW', 'STACK/OVERFLOW' ]
for ts in test_strs:
res = split_new(ts)
print(res)