Home > Net >  Splitting a string based on multiple delimeters using split() function in python by ignoring certain
Splitting a string based on multiple delimeters using split() function in python by ignoring certain

Time:11-17

Not getting desired result while splitting a string based on multiple delimiters and based on specific conditions.

I tried executing below code:

import re
text = r'ced"|"ms|n"|4|98'
finallist = re.split('\"\|\"|\"\||\|', text)

Here i'm trying to split string based on 3 delimiters by joining all using OR (|). First delimiter is by using "|" , another is "| and then using |

finallist looks like this:

finallist=['ced', 'ms','n', '4', '98']

However I don't wish the function to split at ms|n present in the string. As the pipe symbol is present inside the letters enclosed within double quotes i.e in this case "ms|n" so I don't want the function to match pipe symbol for this case.

And I'm expecting the finallist to look like this:

finallist=['ced', 'ms|n', '4', '98']

Is there anyway I can achieve this by changing the logic in the split function? Please let me know.

CodePudding user response:

You can use

"?\|(?!(?:(?<=[A-Za-z]\|)|(?<=[A-Za-z]\\\|))(?=[a-zA-Z]))"?

See the regex demo. Details:

  • "? - an optional " char
  • \| - a | char
  • (?!(?:(?<=[A-Za-z]\|)|(?<=[A-Za-z]\\\|))(?=[a-zA-Z])) - a negative lookahead that fails the match if there is an ASCII letter immediately after the | char AND either an ASCII letter before the | char or an ASCII letter \ right before the | char
  • "? - an optional " char

See the Python demo:

import re
text = r'ced"|"ms|n"|4|98'
pattern = r'"?\|(?!(?:(?<=[A-Za-z]\|)|(?<=[A-Za-z]\\\|))(?=[a-zA-Z]))"?'
print( re.split(pattern, text) )
# => ['ced', 'ms|n', '4', '98']
text = r'ced"|"ms\|n"|4|98'
print( re.split(pattern, text) )
# => ['ced', 'ms\\|n', '4', '98']
  • Related