I have been sifting to very similar questions but I am still stumped. I need to split a string by any non alphanumeric character and keep the delimiters except for parts of the string in double quotes. Hence, for:
string = 'let a = 5 * (other) if x is "constant";'
re.split(pattern, "string")
should yield:
['let', 'a', '=', '5', '*', '(', 'other', '),' 'if', 'x' 'is', '"constant"', ';']
I am getting pretty close with:
re.split(r"(\W)", fragment)
(except for whitespace that I filter out separately) but I cannot manage the double quotes.
Any help appreciated.
CodePudding user response:
You can use
import re
s = 'let a = 5 * (other) if x is "constant";'
print( re.findall(r'"[^"]*"|\w |[^\w\s]', s) )
See the Python demo and the regex demo.
Details:
"[^"]*"
- a"
, zero or more chars other than"
and then a"
|
- or\w
- one or more word chars|
- or[^\w\s]
- a char other than a word and whitespace char.
CodePudding user response:
re.split(r'[ ]|(?<=[(])|(?=[);])', string)
- [ ] - split on space
- (?<=[(]) - split after '('
- (?=[);]) - split before ')' or before ';'
['let', 'a', '=', '5', '*', '(', 'other', ')', 'if', 'x', 'is', '"constant"', ';']