I'm trying to split a given string when two quotes appear and they contain at least 3 characters(I also have to split whenever . or , appear). So, something like hello"example"hello,cat should return [hello;example;hello;cat]. I came up with:
re.split("\'(... )\'|\.|,","hello'example'hello,cat")
This works fine with the quotes, but whenever it split for . or , this happens:
['hello', 'example', 'hello', None, 'cat']
I found out the capture group is the one that causes it (the None in the middle of the list), but it is the only way I know to keep the content. Please keep in mind that I have to do as few as possible computations because the program shall work with huge files, also I'm not very experienced with Python so sorry if I did something obvious wrong.
CodePudding user response:
Try just:
re.split("\'|\.|,", "hello'example'hello,cat")
CodePudding user response:
It's tricky because the open quote and close quote is the exact same character. I think you'd have to use a negative look behind to exclude any single quote that is preceded by 0, 1 or 2 characters and another single quote. In addition, you'd have to use a positive lookahead. This works in javascript.
re.split("(?<!'(.?|..))'(?=[^']{3,})|\.|\,", "hello'example'hello,cat")
But it doesn't look like python supports variable-length lookbehinds. Also, this won't work if there is a lone single quote (apostrophe).