just trying to figure out how to plit a string by comma except when in bracket AND except when directly before and/or after the comma is a dash. I have already found some good solutions for how to deal with the bracket problem but I do not have any clue how to extend this to my problem.
Here is an example:
example_string = 'A-la-carte-Küche, Garnieren (Speisen, Getränke), Kosten-, Leistungsrechnung, Berufsausbildung, -fortbildung'
aim = ['A-la-carte-Küche', 'Garnieren (Speisen, Getränke)', 'Kosten-, Leistungsrechnung', 'Berufsausbildung, -fortbildung']
So far, I have managed to do the following:
>>> re.split(r',\s*(?![^()]*\))', example_string)
>>> out: ['A-la-carte-Küche', 'Garnieren (Speisen, Getränke)', 'Kosten-', 'Leistungsrechnung', 'Berufsausbildung', '-fortbildung']
Note the difference between aim and out for the terms 'Kosten-, Leistungsrechnung' and 'Berufsausbildung, -fortbildung'. Would be glad if someone could help me out such that the output looks like aim.
Thanks in advance!
Alex
CodePudding user response:
If you can make use of the python regex module, you could do:
\([^()]*\)(*SKIP)(*F)|(?<!-)\s*,\s*(?!,)
The pattern matches:
\([^()]*\)
Match from an opening till closing parenthesis(*SKIP)(*F)
Skip the match|
Or(?<!-)\s*,\s*(?!,)
Match a comma between optional whitespace chars to split on
import regex
example_string = 'A-la-carte-Küche, Garnieren (Speisen, Getränke), Kosten-, Leistungsrechnung, Berufsausbildung, -fortbildung'
print(regex.split(r"\([^()]*\)(*SKIP)(*F)|(?<!-)\s*,\s*(?!,)", example_string))
Output
['A-la-carte-Küche', ' Garnieren (Speisen, Getränke)', ' Kosten-, Leistungsrechnung', ' Berufsausbildung', ' -fortbildung']
CodePudding user response:
You can use
re.split(r'(?<!-),(?!\s*-)\s*(?![^()]*\))', example_string)
See the Python demo. Details:
(?<!-)
- a negative lookbehind that fails the match if there is a-
char immediately to the left of the current location,
- a comma(?!\s*-)
- a negative lookahead that fails the match if there is a-
char immediately to the right of the current location\s*
- zero or more whitespaces(?![^()]*\))
- a negative lookahead that fails the match if there are zero or more chars other than)
and(
and then a)
char immediately to the right of the current location.
See the regex demo, too.