I get three parameters in a string. Each parameter is written in the form: Quotes, Name, Quotes, Equals sign, Quotes, Text, Quotes. The parameter separator is a space. Example 1:
"param1"="Peter" "param2"="Harald" "param3"="Marie"
With java.util.regex.Matcher I can find any name and text by the following regex:
"([^"]*)"\s*=\s*"([^"]*)"
Now, however, there may be a quotation mark in the text. This is masked by a backslash. Example 2:
"param1"="Peter" "param2"="Har\"ald" "param3"="Marie"
I have built the following regex:
"([^"]*)"\s*=\s*("([^"]*(\\")*[^"]*)*[^\\]")
This works well for example 2, but is not a universal solution.
If the backslash is at the end of a parameter-value, the solution does not work anymore. Example 3:
"param1"="Peter" "param2"="Harald\" "param3"="Marie"
If the backslash is at the end of the value, the matcher interprets "Harald\" " as the value of parameter 2 instead of "Harald\".
Do you have a universal solution for this problem? Thanks in advance for your input.
Kind regards Dominik
CodePudding user response:
You may use this regex in Java:
\"([^\"]*)\"\h*=\h*(\"[^\\\"]*(?:\\(?=\"(?:\h|$))|(?:\\.[^\\\"]*))*\")
RegEx Demo:
\"([^\"]*)\"
: Match quoted string a parameter name\h*=\h*
: Match=
surrounded with optional spaces(
: Start capture group #1\"
: Match opening"
[^\\\"]*
: Match 0 or more of non-quote, non-backslash characters(?:
:\\
: Match a\
(?=\"(?:\h|$))
: Must be followed by a"
that has a whitespace or line afterwards|
: OR(?:\\.[^\\\"]*))*
: Match an escaped character followed by 0 or more of non-quote, non-backslash characters
\"
: Match closing"
)
: End capture group #1