Home > OS >  Regex to split a string except in double quotes and keep the delimiters
Regex to split a string except in double quotes and keep the delimiters

Time:08-24

I have been sifting to very similar questions but I am still stumped. I need to split a string by any non alphanumeric character and keep the delimiters except for parts of the string in double quotes. Hence, for:

string = 'let a = 5 * (other) if x is "constant";'
re.split(pattern, "string")

should yield:

['let', 'a', '=', '5', '*', '(', 'other', '),' 'if', 'x' 'is', '"constant"', ';']

I am getting pretty close with:

re.split(r"(\W)", fragment)

(except for whitespace that I filter out separately) but I cannot manage the double quotes.

Any help appreciated.

CodePudding user response:

You can use

import re
s = 'let a = 5 * (other) if x is "constant";'
print( re.findall(r'"[^"]*"|\w |[^\w\s]', s) )

See the Python demo and the regex demo.

Details:

  • "[^"]*" - a ", zero or more chars other than " and then a "
  • | - or
  • \w - one or more word chars
  • | - or
  • [^\w\s] - a char other than a word and whitespace char.

CodePudding user response:

re.split(r'[ ]|(?<=[(])|(?=[);])', string)
  • [ ] - split on space
  • (?<=[(]) - split after '('
  • (?=[);]) - split before ')' or before ';'

['let', 'a', '=', '5', '*', '(', 'other', ')', 'if', 'x', 'is', '"constant"', ';']

  • Related