Python regex not returning as expected-CodePudding

I am parsing some text with Python and am running into an odd issue...

an example text that is being parsed:

msg:"ET WEB_SPECIFIC_APPS ClarkConnect Linux proxy.php XSS Attempt"; flow:established,to_server; content:"GET"; content:"script"; nocase; content:"/proxy.php?"; nocase; content:"url="; nocase; pcre:"//proxy.php(?|.[\x26\x3B])url=[^&;\x0D\x0A][<>"']/i"; reference:url,www.securityfocus.com/bid/37446/info; reference:url,doc.emergingthreats.net/2010602; classtype:web-application-attack; sid:2010602; rev:4; metadata:created_at 2010_07_30, updated_at 2010_07_30;

my regex:

msgSearch = re.search(r'msg:"(. )";",line)

actual result:

ET WEB_SPECIFIC_APPS ClarkConnect Linux proxy.php XSS Attempt"; flow:established,to_server; content:"GET"; content:"script"; nocase; content:"/proxy.php?"; nocase; content:"url="; nocase; pcre:"//proxy.php(?|.[\x26\x3B])url=[^&;\x0D\x0A][<>"']/i

expected result:

ET WEB_SPECIFIC_APPS ClarkConnect Linux proxy.php XSS Attempt

There are 10s of thousands of lines of text that I am parsing that are all giving me similar results. Any reason regex is picking a (seemingly) random "; to stop at? I can fix the example above by making the regex more specific, eg. r'msg:"([\w\s\.] )";" but other lines have different characters included. I guess I could just include every special character in my regex, but I'm trying to understand why my wildcard isn't working properly.

Any help would be appreciated!

CodePudding user response：

Try this one:

re.search(r'msg:"([^;] )";',line)

CodePudding user response：

The . is by default "greedy", i.e. it will match as many characters as possible. In your case, it will stop at the last "; sequence, not at the next one. To make it non-greedy (or lazy), try . ? :

 msgSearch = re.search(r'msg:"(. ?)";",line)