Why is regex not returning any matches?-CodePudding

I'm attempting to use regexes to pull certain substrings from the text in the body of an outlook email.

Here is the function with the regex that I'm trying to use:

def EventDescription(emailbody):
    regex = r'\d\d:\d\d\sDescription: (.*?)\s(?:_)'
    substring = re.findall(regex, emailbody, re.DOTALL)
    # return the matches
    return substring

Here is the raw string I'm trying to perform the regex on:

***External Sender - This email is from an external sender.*** 
     
Date/Time Start:     
2021-12-25 08:38     
Anticipated Date/Time Restored:  
2021-12-25 16:21     
Duration:    
7.72     
Outage Type:     
Forced Outage (FO)   
Capacity De-Rate:    
0.00     
Maximo Work Orders:  
Work Order #:NA00360131, Description: ALO1: ALO1_B002_P008.Inv016 - Inverter 16 is derating due to a broken tracking motor.  

Created On: 2021-12-26 09:06
Description: Updated!
Inverter 16 is restored by site tech.
________________________________________

     
Date/Time Start:     
2021-12-25 09:53     
Anticipated Date/Time Restored:  
2021-12-27 16:00     
Duration:    
54.12    
Outage Type:     
Forced Outage (FO)   
Capacity De-Rate:    
0.85     
Maximo Work Orders:  
Work Order #:NA00360131, Description: ALO1: ALO1_B002_P008.Inv016 - Inverter 16 is derating due to a broken tracking motor.  

Created On: 2021-12-25 13:58
Description: 
Updated ETR.
________________________________________

     
Date/Time Start:     
2021-12-25 09:53     
Anticipated Date/Time Restored:  
2021-12-25 16:00     
Duration:    
6.12     
Outage Type:     
Forced Outage (FO)   
Capacity De-Rate:    
0.85     
Maximo Work Orders:  
Work Order #:NA00360131, Description: ALO1: ALO1_B002_P008.Inv016 - Inverter 16 is derating due to a broken tracking motor.  

Created On: 2021-12-25 09:54
Description: 
Inverter 16 tripped offline. No fault code available in FleetCon or SCADA. Site personnel informed.
________________________________________

But when I try to pull one of the entries from the list "substring" I get a "IndexError: list index out of range" error. Any suggestions?

CodePudding user response：

Your regex is pretty complicated and does not return any matches. You should try this one:

regex = r'\d\d:\d\d\sDescription: (.*)'

It simply matches the whole string after Description: . With your example string, it returns all matches you requested.

EDIT: You also have to remove an argument from findall for it to work. The re.DOTALL does not stop the match at a new line. With following code you should receive the desired output:

def EventDescription(emailbody):
    regex = r'\d\d:\d\d\sDescription: (.*)'
    substring = re.findall(regex, emailbody)
    # return the matches
    return substring

CodePudding user response：

Try something like this:

Description:\s(.*?)_{40,}

This way we're matching everything between Description and the line of underscores. Note you still need to use DOTALL mode.

https://regex101.com/r/qoiSwl/1

I added End of Event \n\n\n just to highlight where each match starts and ends, since with the newlines it's hard to tell.

CodePudding user response：

Use

\d{2}:\d{2}\s Description:\s*(.*?)\s _

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  \d{2}                    digits (0-9) (2 times)
--------------------------------------------------------------------------------
  :                        ':'
--------------------------------------------------------------------------------
  \d{2}                    digits (0-9) (2 times)
--------------------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  Description:             'Description:'
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    .*?                      any character (0 or more times
                             (matching the least amount possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  _                        '_'