Home > Mobile >  Why is regex not returning any matches?
Why is regex not returning any matches?

Time:12-28

I'm attempting to use regexes to pull certain substrings from the text in the body of an outlook email.

Here is the function with the regex that I'm trying to use:

def EventDescription(emailbody):
    regex = r'\d\d:\d\d\sDescription: (.*?)\s(?:_)'
    substring = re.findall(regex, emailbody, re.DOTALL)
    # return the matches
    return substring

Here is the raw string I'm trying to perform the regex on:

***External Sender - This email is from an external sender.*** 
     
Date/Time Start:     
2021-12-25 08:38     
Anticipated Date/Time Restored:  
2021-12-25 16:21     
Duration:    
7.72     
Outage Type:     
Forced Outage (FO)   
Capacity De-Rate:    
0.00     
Maximo Work Orders:  
Work Order #:NA00360131, Description: ALO1: ALO1_B002_P008.Inv016 - Inverter 16 is derating due to a broken tracking motor.  

Created On: 2021-12-26 09:06
Description: Updated!
Inverter 16 is restored by site tech.
________________________________________

     
Date/Time Start:     
2021-12-25 09:53     
Anticipated Date/Time Restored:  
2021-12-27 16:00     
Duration:    
54.12    
Outage Type:     
Forced Outage (FO)   
Capacity De-Rate:    
0.85     
Maximo Work Orders:  
Work Order #:NA00360131, Description: ALO1: ALO1_B002_P008.Inv016 - Inverter 16 is derating due to a broken tracking motor.  

Created On: 2021-12-25 13:58
Description: 
Updated ETR.
________________________________________

     
Date/Time Start:     
2021-12-25 09:53     
Anticipated Date/Time Restored:  
2021-12-25 16:00     
Duration:    
6.12     
Outage Type:     
Forced Outage (FO)   
Capacity De-Rate:    
0.85     
Maximo Work Orders:  
Work Order #:NA00360131, Description: ALO1: ALO1_B002_P008.Inv016 - Inverter 16 is derating due to a broken tracking motor.  

Created On: 2021-12-25 09:54
Description: 
Inverter 16 tripped offline. No fault code available in FleetCon or SCADA. Site personnel informed.
________________________________________

But when I try to pull one of the entries from the list "substring" I get a "IndexError: list index out of range" error. Any suggestions?

CodePudding user response:

Your regex is pretty complicated and does not return any matches. You should try this one:

regex = r'\d\d:\d\d\sDescription: (.*)'

It simply matches the whole string after Description: . With your example string, it returns all matches you requested.

EDIT: You also have to remove an argument from findall for it to work. The re.DOTALL does not stop the match at a new line. With following code you should receive the desired output:

def EventDescription(emailbody):
    regex = r'\d\d:\d\d\sDescription: (.*)'
    substring = re.findall(regex, emailbody)
    # return the matches
    return substring

CodePudding user response:

Try something like this:

Description:\s(.*?)_{40,}

This way we're matching everything between Description and the line of underscores. Note you still need to use DOTALL mode.

https://regex101.com/r/qoiSwl/1

I added End of Event \n\n\n just to highlight where each match starts and ends, since with the newlines it's hard to tell.

CodePudding user response:

Use

\d{2}:\d{2}\s Description:\s*(.*?)\s _

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  \d{2}                    digits (0-9) (2 times)
--------------------------------------------------------------------------------
  :                        ':'
--------------------------------------------------------------------------------
  \d{2}                    digits (0-9) (2 times)
--------------------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  Description:             'Description:'
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    .*?                      any character (0 or more times
                             (matching the least amount possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  _                        '_'
  • Related