Home > Enterprise >  Problem with regex creation if excape character is at the end of the parameter value
Problem with regex creation if excape character is at the end of the parameter value

Time:05-17

I get three parameters in a string. Each parameter is written in the form: Quotes, Name, Quotes, Equals sign, Quotes, Text, Quotes. The parameter separator is a space. Example 1:

"param1"="Peter" "param2"="Harald" "param3"="Marie"

With java.util.regex.Matcher I can find any name and text by the following regex:

"([^"]*)"\s*=\s*"([^"]*)"

Now, however, there may be a quotation mark in the text. This is masked by a backslash. Example 2:

"param1"="Peter" "param2"="Har\"ald" "param3"="Marie" 

I have built the following regex:

"([^"]*)"\s*=\s*("([^"]*(\\")*[^"]*)*[^\\]")

This works well for example 2, but is not a universal solution.

If the backslash is at the end of a parameter-value, the solution does not work anymore. Example 3:

"param1"="Peter" "param2"="Harald\" "param3"="Marie"

If the backslash is at the end of the value, the matcher interprets "Harald\" " as the value of parameter 2 instead of "Harald\".

Do you have a universal solution for this problem? Thanks in advance for your input.

Kind regards Dominik

CodePudding user response:

You may use this regex in Java:

\"([^\"]*)\"\h*=\h*(\"[^\\\"]*(?:\\(?=\"(?:\h|$))|(?:\\.[^\\\"]*))*\")

RegEx Demo

RegEx Demo:

  • \"([^\"]*)\": Match quoted string a parameter name
  • \h*=\h*: Match = surrounded with optional spaces
  • (: Start capture group #1
  • \": Match opening "
  • [^\\\"]*: Match 0 or more of non-quote, non-backslash characters
  • (?::
    • \\: Match a \
    • (?=\"(?:\h|$)): Must be followed by a " that has a whitespace or line afterwards
    • |: OR
    • (?:\\.[^\\\"]*))*: Match an escaped character followed by 0 or more of non-quote, non-backslash characters
  • \": Match closing "
  • ): End capture group #1
  • Related