I have this pattern, ([\d.] |\W )
, but it doesn't capture parenthesis and double quotes together. For example, I have this:
var a = "hello world"
print("hello", a")
I want it to capture it like this (including whitespace):
var, a, =, ", hello, world, ", print, (, ", hello, ", ,, a, )
CodePudding user response:
try this
import re
regex = re.compile(r'\s*(:?\w |\W)')
txt='''
var a = "hello world"
print("hello", a")
'''.replace('\n','')
print(regex.findall(txt))
CodePudding user response:
While I love regular expressions and frequently use them, you'll often find that they're better paired with another parsing method if you have a complex body to parse (rather than, say, a verifier for a cli arg or html field), and cutting down the source to something more manageable by splitting on some collection of characters will make your parser much more robust
The plural of regex is regrets
-- ifosteve
Though it's not very clear from your question, you may really be looking for an Abstract Syntax Tree parser!
Further, based upon your use of var
, you may have JavaScript source
take a look at these, though I can't comment on their quality