Home > Software engineering >  Regular expression to match a value, optionally enclosed in parenthesis
Regular expression to match a value, optionally enclosed in parenthesis

Time:08-11

Someone help me out with a regular expression please.

What I want to match:

="..."
='...'
=...
="...'..."
='..."...'

but not:

="...'
='..."

I want to match a value, optionally enclosed in one of two parenthesis types (" or '). Both side should be terminated by the same type of parenthesis or nothing. The value itself may not contain parenthesis of the type it is opened or closed with. If it lacks outer parenthesis, it may not contain any parenthesis. I have come this far:

\=[\x22\x27]{0,1}([^\x22\x27]*?)[\x22\x27]{0,1}$

\x22 is " and \x27 is '
\= and $ is just for clarification. The actual borders are defined differently.

The 1st problem is, that the latter [\x22\x27]{0,1} should be equal to the initial [\x22\x27]{0,1}
2nd problem is, that the middle term [^\x22\x27] should be the negation of what the outer terms evaluate to.

CodePudding user response:

Alternative suggestion that requires fewer RegEx features:

=([^'"\n]*|(?:"[^"\n]*")|(?:'[^'\n]*'))$

https://regex101.com/r/QqbbcQ/1

The boundaries = and $ are just like your RegEx. The (?:...) are noncapturing groups. I've used ' and " instead of the hex escapes for readability; you might have to escape them if your language doesn't provide raw string / RegEx literals.

At the core of the RegEx is the "choice" between the three possible "string literals":

  • [^'"\n]*: String without delimiters, no quotes allowed;
  • "[^"\n]*": Double-quoted string;
  • '[^'\n]*': Single-quoted string

All character sets disallow (UNIX) newlines since string literals may not cross them; consider adding CR (\r) if you also want to forbid Mac (CR) & Windows (CRLF) newlines.

CodePudding user response:

^=(?:(?:"([^"\r\n] )")|(?:'([^'\r\n] )')|(?:([^"'\r\n] )))$

https://regex101.com/r/bZc1YL/1

CodePudding user response:

You can use

=(?:([\x22\x27])(?=.*\1$))?((?:(?!\1).)*?)\1?$

See this regex demo. Details:

  • = - a = char (it should not be escaped)
  • (?:([\x22\x27])(?=.*\1$))? - an optional occurrence of either a ' or " char captured into Group 1 (\1) that is followed with any string of non-linebreak chars till the same char as in Group 1 at the end of the string
  • ((?:(?!\1).)*?) - Group 2: any char other than line break chars, zero or more but as few as possible occurrences, that is not equal to the Group 1 value
  • \1? - Group 1 value (optional, for cases when Group 1 was not matched)
  • $ - end of string.

Another idea is using a conditional:

=([\x22\x27])?((?:(?!\1).)*?)(?(1)\1)$

See this regex demo.

Here, (?(1)\1) matches Group 1 value if Group 1 matched. Else, nothing is required at this location in the string.

Note for ECMAScript regex flavor

In this flavor (used in JavaScript, for example), group values that did not participate in the match are set to an empty string, not null, and as a consequence, the technique used in the first regex does not work for cases like =123.

In that case, you will have to reside to repetitions like

=(?:"(.*)"|'(.*)'|(.*))$

and using extra code logic define what group value you need to extract.

  • Related