I have a string with 2 phrases, separated by an upper case word in the same string:
c="Text is here. TEST . More text here also"
I want to separate both phrases, removing the upper case word, TEST
so that the output looks like:
["Text is here.","More text here also"]
What I did:
import re
c="Text is here. TEST . More text here also"
s=re.split('[A-Z][A-Z\d] ',c)
t=[re.sub('[^A-Za-z0-9]',' ',i) for i in s]
But I still get some unwanted spaces:
['Text is here ', ' More text here also']
Is there a cleaner and pythonic way to generate t
?
CodePudding user response:
>>> re.split('\s*[A-Z]{2,}[\s\.]*', c)
['Text is here.', 'More text here also']
Spaces (optional) followed by at least two uppercase characters, followed by spaces or dots (optional).
CodePudding user response:
This works, but it isn't that elegant.
c="Text is here. TEST . More text here also"
In [20]: [i.strip().replace('. ','') for i in c.split('TEST')]
Out[20]: ['Text is here.', 'More text here also']