Extracted a threaded email body using pywin32 and I need to extract the signature part alone from the body. I tried to split the text using the signature start as starting word and the next email From as the Ending word. For ex: 'With Regards' will be the starting word and 'Von' will be the ending word.
Body of the email:
Dear Sir,
Your Order is ready and tested. It will be shipped shortly.
Let me know once you receive it.
With Regards,
dcabv,
vce technologies
vce.com
cont: 00440044
From: [email protected]
To:[email protected]
Sub: product
Dear Sir,
Can I get the update of my order?
With Regards,
abc
cont: 46346466
Von: [email protected]
Gesendet:[email protected]
Sub: Order Placed
Dear Sir,
your order has placed.
you will receive it shortly.
With Regards,
dcabv,
vce technologies
vce.com
cont: 00440044
Another Text:
a = """
Best regards,
i.V. Cap. Mars Wel
Chief Superintendent
==========================
P E N T H O L E
Sahrts-HC
Elba 379
D- 259 Ham
Tel: 58 58 58585-584
Mobile: 91 758 858 5875
Fax: 47 47 85885-855
Sitz: Ham, HA 5772
Von: Korayae Vinay <[email protected]>
Gesendet: Donnes, 19. Januar 2014 12:16
An: Wel, Mars <[email protected]>
Betreff: RE: Prod Order
Dear Donnes;
Good day
A few minutes before ı placed the order.
If you need any assistance we are happy to help with that.
Best Regards
Korayae Vinay
Managing Director
"""
Any suggestions?
The code I tried is below:
re.findall('(?:(?:With best regards,|Best Regards,)\s*.*(?:Von:|From:))', body, flags = re.IGNORECASE|re.DOTALL|re.MULTILINE)
Note: body is the body of the extracted email.
The Output I received:
With Regards,
dcabv,
vce technologies
vce.com
cont: 00440044
From: [email protected]
To:[email protected]
Sub: product
Dear Sir,
Can I get the update of my order?
With Regards,
abc
cont: 46346466
Von:
But I want the output to be splitted into two as 1st output will be from 'With Regards' of 1st email til 'From'.
2nd output will be from 'With Regards' of 2nd email til 'Von'.
CodePudding user response:
Assuming that all the lines below signature line don't contain an empty line, you may use:
(?mi)^with(?:\s best)?\s regards.*(?:\n. )
RegEx Breakdown:
(?mi)
: Ignore Case and Multi line Mode^
: Startwith
: Match testwith
(?:\s best)?
: Optionally match 1 spaces followed bybest
\s regards
: match 1 spaces followed byregards
.*
: Match everything else till end of line(?:\n. )
: Match 1 of any characters followed by line break. Repeat this group 1 times
Update:
Based on your edited question, you may use this regex:
(?mi)^(?:with\s )?(?:best\s )?regards.*(?:\n(?!von:|from:).*)
Here (?!von:|from:)
is a negative lookahead to stop the match when we encounter von:
or from:
on next line.