Home > Software engineering >  How to handle " in Regex Python
How to handle " in Regex Python

Time:10-11

I am trying to grab fary_trigger_post in the code below using Regex. However, I don't understand why it always includes " in the end of the matched pattern, which I don't expect. Any idea or suggestion?

re.match(
r'-instance[ "\']*(. )[ "\']*$', 
'-instance "fary_trigger_post" '.strip(), 
flags=re.S).group(1)


'fary_trigger_post"'

Thank you.

CodePudding user response:

The (. ) is greedy and grabs ANY character until the end of the input. If you modified your input to include characters after the final double quote (e.g. '-instance "fary_trigger_post" asdf') you would find the double quote and the remaining characters in the output (e.g. fary_trigger_post" asdf). Instead of . you should try [^"\'] to capture all characters except the quotes. This should return what you expect.

re.match(r'-instance[ "\']*([^"\'] )[ "\'].*$', '-instance "fary_trigger_post" '.strip(), flags=re.S).group(1)

Also, note that I modified the end of the expression to use .* which will match any characters following the last quote.

CodePudding user response:

Here's what I'd use in your matching string, but it's hard to provide a better answer without knowing all your cases:

r'-instance\s "(. )"\s*$'

CodePudding user response:

When you try to get group 1 (i.e. (. )) regex will follow this match to the end of string, as it can match . (any character) 1 or more times (but it will take maximum amount of times). I would suggest use the following pattern:

'-instance[ "\']*(. )["\']  *$'

This will require regex to match all spaces in the end and all quoutes separatelly, so that it won't be included into group 1

  • Related