Home > Software design >  Split a string by comma except when in bracket and except when directly before and/or after the comm
Split a string by comma except when in bracket and except when directly before and/or after the comm

Time:03-12

just trying to figure out how to plit a string by comma except when in bracket AND except when directly before and/or after the comma is a dash. I have already found some good solutions for how to deal with the bracket problem but I do not have any clue how to extend this to my problem.

Here is an example:

example_string = 'A-la-carte-Küche, Garnieren (Speisen, Getränke), Kosten-, Leistungsrechnung, Berufsausbildung, -fortbildung'
aim = ['A-la-carte-Küche', 'Garnieren (Speisen, Getränke)', 'Kosten-, Leistungsrechnung', 'Berufsausbildung, -fortbildung']

So far, I have managed to do the following:

>>> re.split(r',\s*(?![^()]*\))', example_string)
>>> out: ['A-la-carte-Küche', 'Garnieren (Speisen, Getränke)', 'Kosten-', 'Leistungsrechnung', 'Berufsausbildung', '-fortbildung']

Note the difference between aim and out for the terms 'Kosten-, Leistungsrechnung' and 'Berufsausbildung, -fortbildung'. Would be glad if someone could help me out such that the output looks like aim.

Thanks in advance!
Alex

CodePudding user response:

If you can make use of the python regex module, you could do:

\([^()]*\)(*SKIP)(*F)|(?<!-)\s*,\s*(?!,)

The pattern matches:

  • \([^()]*\) Match from an opening till closing parenthesis
  • (*SKIP)(*F) Skip the match
  • | Or
  • (?<!-)\s*,\s*(?!,) Match a comma between optional whitespace chars to split on

Regex demo

import regex

example_string = 'A-la-carte-Küche, Garnieren (Speisen, Getränke), Kosten-, Leistungsrechnung, Berufsausbildung, -fortbildung'
print(regex.split(r"\([^()]*\)(*SKIP)(*F)|(?<!-)\s*,\s*(?!,)", example_string))

Output

['A-la-carte-Küche', ' Garnieren (Speisen, Getränke)', ' Kosten-, Leistungsrechnung', ' Berufsausbildung', ' -fortbildung']

CodePudding user response:

You can use

re.split(r'(?<!-),(?!\s*-)\s*(?![^()]*\))', example_string)

See the Python demo. Details:

  • (?<!-) - a negative lookbehind that fails the match if there is a - char immediately to the left of the current location
  • , - a comma
  • (?!\s*-) - a negative lookahead that fails the match if there is a - char immediately to the right of the current location
  • \s* - zero or more whitespaces
  • (?![^()]*\)) - a negative lookahead that fails the match if there are zero or more chars other than ) and ( and then a ) char immediately to the right of the current location.

See the regex demo, too.

  • Related