Home > OS >  How can i split this string using regular expressions
How can i split this string using regular expressions

Time:03-23

I have a string similar to:

"'a b | c'\,\,\,  'd | e f' ,,, 'g | h"

I want to use re.split to get the following list:

["a b|c", "d|e f", "g|h"]

I have tried the following but do not get the output i want, essentially i need to get rid all everything aside from the letters and the pipe operator, and split. One issue is that sometimes both ' and " are used:

re.compile(r'[\"\',][\W ]', re.UNICODE).split(txt.lower())

CodePudding user response:

Remove the spaces around | as a separate step after splitting.

split = re.compile(r'[\"\',][\W ]', re.UNICODE).split(txt.lower())
cleaned = [re.sub(r'\s*\|\s*', '|', x) for x in split]

CodePudding user response:

I don't think you can just use split. You probably can't get rid of the first quote, or will end up with an empty first item.:

Here is one attempt, but it fails to remove the initial ':

re.split(r"(?<=.)'[^'] '", txt)

output: ["'a b | c", 'd | e f', 'g | h']

An alternative with findall:

re.findall(r"'([^'] )'?", txt)

output: ['a b | c', 'd | e f', 'g | h']

  • Related