Home > Net >  Split only when quotes contain 3 characters or more
Split only when quotes contain 3 characters or more

Time:05-31

I'm trying to split a given string when two quotes appear and they contain at least 3 characters(I also have to split whenever . or , appear). So, something like hello"example"hello,cat should return [hello;example;hello;cat]. I came up with:

re.split("\'(... )\'|\.|,","hello'example'hello,cat")

This works fine with the quotes, but whenever it split for . or , this happens:

['hello', 'example', 'hello', None, 'cat']

I found out the capture group is the one that causes it (the None in the middle of the list), but it is the only way I know to keep the content. Please keep in mind that I have to do as few as possible computations because the program shall work with huge files, also I'm not very experienced with Python so sorry if I did something obvious wrong.

CodePudding user response:

Try just:

re.split("\'|\.|,", "hello'example'hello,cat")

CodePudding user response:

It's tricky because the open quote and close quote is the exact same character. I think you'd have to use a negative look behind to exclude any single quote that is preceded by 0, 1 or 2 characters and another single quote. In addition, you'd have to use a positive lookahead. This works in javascript.

re.split("(?<!'(.?|..))'(?=[^']{3,})|\.|\,", "hello'example'hello,cat")

But it doesn't look like python supports variable-length lookbehinds. Also, this won't work if there is a lone single quote (apostrophe).

  • Related