Home > database >  python - re.split a string with a keyword unless there is a specific keyword preceding it
python - re.split a string with a keyword unless there is a specific keyword preceding it

Time:07-29

here is the code:

text = "Sir John Doe, married to Mrs Jane Doe, Sir Jack Doe, Mrs Mary Doe" 
splitter = re.split('Sir|Mrs', text)

I want the text to be split by the words 'Sir' or 'Mrs' unless there is the string 'married to' before it.

Current output:

''
'John Doe, married to'
'Jane Doe,'
'Jack Doe,'
'Mary Doe'

Desired output:

''
'John Doe, married to Mrs Jane Doe,'
'Jack Doe,'
'Mary Doe'

CodePudding user response:

I would use an re.findall approach here:

text = "Sir John Doe, married to Mrs Jane Doe, Sir Jack Doe, Mrs Mary Doe"
matches = re.findall(r'\b(?:Sir|Mrs) \w  \w (?:, married to (?:Mrs|Sir) \w  \w )?', text)
print(matches)

This prints:

['Sir John Doe, married to Mrs Jane Doe', 'Sir Jack Doe', 'Mrs Mary Doe']

The regex pattern used here says to match:

\b(?:Sir|Mrs)                         leading Sir/Mrs
  \w  \w                              first and last names
(?:
    , married to (?:Mrs|Sir) \w  \w   optional 'married to' followed by another name
)?                                    zero or one time
  • Related