I'm attempting to use regexes to pull certain substrings from the text in the body of an outlook email.
Here is the function with the regex that I'm trying to use:
def EventDescription(emailbody):
regex = r'\d\d:\d\d\sDescription: (.*?)\s(?:_)'
substring = re.findall(regex, emailbody, re.DOTALL)
# return the matches
return substring
Here is the raw string I'm trying to perform the regex on:
***External Sender - This email is from an external sender.***
Date/Time Start:
2021-12-25 08:38
Anticipated Date/Time Restored:
2021-12-25 16:21
Duration:
7.72
Outage Type:
Forced Outage (FO)
Capacity De-Rate:
0.00
Maximo Work Orders:
Work Order #:NA00360131, Description: ALO1: ALO1_B002_P008.Inv016 - Inverter 16 is derating due to a broken tracking motor.
Created On: 2021-12-26 09:06
Description: Updated!
Inverter 16 is restored by site tech.
________________________________________
Date/Time Start:
2021-12-25 09:53
Anticipated Date/Time Restored:
2021-12-27 16:00
Duration:
54.12
Outage Type:
Forced Outage (FO)
Capacity De-Rate:
0.85
Maximo Work Orders:
Work Order #:NA00360131, Description: ALO1: ALO1_B002_P008.Inv016 - Inverter 16 is derating due to a broken tracking motor.
Created On: 2021-12-25 13:58
Description:
Updated ETR.
________________________________________
Date/Time Start:
2021-12-25 09:53
Anticipated Date/Time Restored:
2021-12-25 16:00
Duration:
6.12
Outage Type:
Forced Outage (FO)
Capacity De-Rate:
0.85
Maximo Work Orders:
Work Order #:NA00360131, Description: ALO1: ALO1_B002_P008.Inv016 - Inverter 16 is derating due to a broken tracking motor.
Created On: 2021-12-25 09:54
Description:
Inverter 16 tripped offline. No fault code available in FleetCon or SCADA. Site personnel informed.
________________________________________
But when I try to pull one of the entries from the list "substring" I get a "IndexError: list index out of range" error. Any suggestions?
CodePudding user response:
Your regex is pretty complicated and does not return any matches. You should try this one:
regex = r'\d\d:\d\d\sDescription: (.*)'
It simply matches the whole string after Description:
. With your example string, it returns all matches you requested.
EDIT: You also have to remove an argument from findall
for it to work. The re.DOTALL
does not stop the match at a new line. With following code you should receive the desired output:
def EventDescription(emailbody):
regex = r'\d\d:\d\d\sDescription: (.*)'
substring = re.findall(regex, emailbody)
# return the matches
return substring
CodePudding user response:
Try something like this:
Description:\s(.*?)_{40,}
This way we're matching everything between Description and the line of underscores. Note you still need to use DOTALL mode.
https://regex101.com/r/qoiSwl/1
I added End of Event \n\n\n
just to highlight where each match starts and ends, since with the newlines it's hard to tell.
CodePudding user response:
Use
\d{2}:\d{2}\s Description:\s*(.*?)\s _
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
\d{2} digits (0-9) (2 times)
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
\d{2} digits (0-9) (2 times)
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
Description: 'Description:'
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
.*? any character (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
_ '_'