Home > Mobile >  Hierarchical string delimiting not splitting
Hierarchical string delimiting not splitting

Time:11-04

I am trying to create a function to parse a string based on multiple delimiters, but in a hierarchical format: i.e., try the first delimiter, then the second, then the third, etc.

This question seemingly provides a solution, specifically linking this comment.

# Split the team names, with a hierarchical delimiter
def split_new(inp, delims=['VS', '/ ' ,'/']):
    # https://stackoverflow.com/questions/67574893/python-split-string-by-multiple-delimiters-following-a-hierarchy
    for d in delims:
        result = inp.split(d, maxsplit=1)
        if len(result) == 2: 
            return result
        else:
            return [inp] # If nothing worked, return the input  

test_strs = ['STACK/ OVERFLOW', 'STACK #11/00 VS OVERFLOW', 'STACK/OVERFLOW' ]

for ts in test_strs:
    res = split_new(ts)
    print(res)

"""
Output:
['STACK/ OVERFLOW']
['STACK #11/00 ', ' OVERFLOW']
['STACK/OVERFLOW']

Expected:
['STACK',' OVERFLOW']
['STACK #11/00 ', ' OVERFLOW']
['STACK', 'OVERFLOW']

"""

However, my results are not as expected. What am I missing?

CodePudding user response:

Execute the "nothing worked" fallback AFTER trying all delimiters:

for d in delims:
    result = inp.split(d, maxsplit=1)
    if len(result) == 2: 
        return result
return [inp] # If nothing worked, return the input  

CodePudding user response:

As an alternative, instead of looping the delimiters, you might use a single pattern with an alternation |

import re

test_strs = ['STACK/ OVERFLOW', 'STACK #11/00 VS OVERFLOW', 'STACK/OVERFLOW' ]
pattern = r"/(?!\d)|VS"
for s in test_strs:
    print(re.split(pattern, s))

Output

['STACK', ' OVERFLOW']
['STACK #11/00 ', ' OVERFLOW']
['STACK', 'OVERFLOW']

CodePudding user response:

This is because you try to return result on first iteration of loop when there is split for 'VS' you return result using else statement

right way of doing it is:

def split_new(inp, delims=['VS', '/ ' ,'/']):
    for d in delims:
        result = inp.split(d, maxsplit=1)
        if len(result) == 2: 
            return result
        
    return [inp] # If nothing worked, return the input  

test_strs = ['STACK/ OVERFLOW', 'STACK #11/00 VS OVERFLOW', 'STACK/OVERFLOW' ]

for ts in test_strs:
    res = split_new(ts)
    print(res)
  • Related