With a key value pair string that is separated by space character (just one I believe will ever happen) but also allows spaces and other white space (e.g. newlines, tabs) in the value, e.g.
a=1 b=cat c=1 and 2 d=3
becomes:
- a=1
- b=cat
- c=1 and 2
- d=3
i.e. I want to extract all the pairs as groups.
I cannot figure out the regex. My sample doesn't include newline but that could also happen
I've tried the basics like:
(. ?=. ?)
\s?([^\s] )
but these fail with space and newlines. I'm coding it also so can tidy up any leading/trailing characters where needed, I just rather do it in regex than scan one character at a time.
CodePudding user response:
You can use
([^\s=] )=([\w\W]*?)(?=\s [^\s=] =|$)
See the regex demo. Details:
([^\s=] )
- Group 1: one or more chars other than whitespace and=
char=
- a=
char([\w\W]*?)
- Group 2: any zero or more chars, as few as possible(?=\s [^\s=] =|$)
- a positive lookahead that requires one or more whitespaces followed with one or more chars other than whitespace and=
followed with=
or end of string immediately to the right of the current location.
A better idea to match any character instead of [\w\W]
is by using a .
and the singleline/dotall modifier (if supported, see How do I match any character across multiple lines in a regular expression?), here is an example:
(?s)([^\s=] )=(.*?)(?=\s [^\s=] =|$)