here is the code:
text = "Sir John Doe, married to Mrs Jane Doe, Sir Jack Doe, Mrs Mary Doe"
splitter = re.split('Sir|Mrs', text)
I want the text to be split by the words 'Sir' or 'Mrs' unless there is the string 'married to' before it.
Current output:
''
'John Doe, married to'
'Jane Doe,'
'Jack Doe,'
'Mary Doe'
Desired output:
''
'John Doe, married to Mrs Jane Doe,'
'Jack Doe,'
'Mary Doe'
CodePudding user response:
I would use an re.findall
approach here:
text = "Sir John Doe, married to Mrs Jane Doe, Sir Jack Doe, Mrs Mary Doe"
matches = re.findall(r'\b(?:Sir|Mrs) \w \w (?:, married to (?:Mrs|Sir) \w \w )?', text)
print(matches)
This prints:
['Sir John Doe, married to Mrs Jane Doe', 'Sir Jack Doe', 'Mrs Mary Doe']
The regex pattern used here says to match:
\b(?:Sir|Mrs) leading Sir/Mrs
\w \w first and last names
(?:
, married to (?:Mrs|Sir) \w \w optional 'married to' followed by another name
)? zero or one time