Not getting desired result while splitting a string based on multiple delimiters and based on specific conditions.
I tried executing below code:
import re
text = r'ced"|"ms|n"|4|98'
finallist = re.split('\"\|\"|\"\||\|', text)
Here i'm trying to split string based on 3 delimiters by joining all using OR (|). First delimiter is by using "|" , another is "| and then using |
finallist looks like this:
finallist=['ced', 'ms','n', '4', '98']
However I don't wish the function to split at ms|n present in the string. As the pipe symbol is present inside the letters enclosed within double quotes i.e in this case "ms|n" so I don't want the function to match pipe symbol for this case.
And I'm expecting the finallist to look like this:
finallist=['ced', 'ms|n', '4', '98']
Is there anyway I can achieve this by changing the logic in the split function? Please let me know.
CodePudding user response:
You can use
"?\|(?!(?:(?<=[A-Za-z]\|)|(?<=[A-Za-z]\\\|))(?=[a-zA-Z]))"?
See the regex demo. Details:
"?
- an optional"
char\|
- a|
char(?!(?:(?<=[A-Za-z]\|)|(?<=[A-Za-z]\\\|))(?=[a-zA-Z]))
- a negative lookahead that fails the match if there is an ASCII letter immediately after the|
char AND either an ASCII letter before the|
char or an ASCII letter\
right before the|
char"?
- an optional"
char
See the Python demo:
import re
text = r'ced"|"ms|n"|4|98'
pattern = r'"?\|(?!(?:(?<=[A-Za-z]\|)|(?<=[A-Za-z]\\\|))(?=[a-zA-Z]))"?'
print( re.split(pattern, text) )
# => ['ced', 'ms|n', '4', '98']
text = r'ced"|"ms\|n"|4|98'
print( re.split(pattern, text) )
# => ['ced', 'ms\\|n', '4', '98']