Given a file 'funeral.txt' write a regular expression for people names that is triggered by honorifics (Mr, Ms, Senator, etc).
I have so far written the code
import re
file = open("funeral.txt", "r")
text = file.read()
file.close()
honorific_names = "([M][a-z\.]{2,4})\s ([A-Z][a-z]*[\.\-\']*)*"
re.findall(honorific_names, text)
This gives me the output as
[('Mrs.', 'Meghan'), ('Mrs.', 'Oprah'), ('Mr.', 'Boris'), ('Ms.', 'Megan')]
However what I am looking for is:
['Mrs. Meghan', 'Mrs. Oprah Winfrey', 'Mr. Boris Johnson', 'Ms. Megan Specia']
How exactly should I modify the pattern ?
Some text from funeral.txt
The service also came just weeks after he and his wife, Mrs. Meghan, the Duchess of Sussex, gave a bombshell interview to Mrs. Oprah Winfrey in which they laid bare their problems with the royal family
According to Mr. Boris Johnson, the whole of United Kingdom are sadden by the passing of Prince Philip.
— Ms. Megan Specia
CodePudding user response:
Apply " ".join
on each pair to make one string
[" ".join(x) for x in [('Mrs.', 'Meghan'), ('Mrs.', 'Oprah'), ('Mr.', 'Boris'), ('Ms.', 'Megan')]]
['Mrs. Meghan', 'Mrs. Oprah', 'Mr. Boris', 'Ms. Megan']
import re
with open("funeral.txt", "r") as file:
text = file.read()
result = [" ".join(x) for x in re.findall(r"([M][a-z.]{2,4})\s ([A-Z][a-z]*[.\-\']*)*", text)]
CodePudding user response:
honorific_names = r'((?:Mrs|Mr|Ms|Senator|Sr|Dr)\.?\s(?:[A-Z][a-z] )(?:\s[A-Z][a-z] )*)'
I think this would at least check out on your sample text, should be easy enough to expand on for your extended need. It's not the cleanest solution maybe, one could compact the Mr/Ms/Mrs but for readability it makes sense I'd say.