I have a string similar to:
"'a b | c'\,\,\, 'd | e f' ,,, 'g | h"
I want to use re.split to get the following list:
["a b|c", "d|e f", "g|h"]
I have tried the following but do not get the output i want, essentially i need to get rid all everything aside from the letters and the pipe operator, and split. One issue is that sometimes both ' and " are used:
re.compile(r'[\"\',][\W ]', re.UNICODE).split(txt.lower())
CodePudding user response:
Remove the spaces around |
as a separate step after splitting.
split = re.compile(r'[\"\',][\W ]', re.UNICODE).split(txt.lower())
cleaned = [re.sub(r'\s*\|\s*', '|', x) for x in split]
CodePudding user response:
I don't think you can just use split
. You probably can't get rid of the first quote, or will end up with an empty first item.:
Here is one attempt, but it fails to remove the initial '
:
re.split(r"(?<=.)'[^'] '", txt)
output: ["'a b | c", 'd | e f', 'g | h']
An alternative with findall
:
re.findall(r"'([^'] )'?", txt)
output: ['a b | c', 'd | e f', 'g | h']