Home > Software engineering >  KVP extraction but allow the separator in the value
KVP extraction but allow the separator in the value

Time:09-27

With a key value pair string that is separated by space character (just one I believe will ever happen) but also allows spaces and other white space (e.g. newlines, tabs) in the value, e.g.

a=1 b=cat c=1 and 2 d=3

becomes:

  • a=1
  • b=cat
  • c=1 and 2
  • d=3

i.e. I want to extract all the pairs as groups.

I cannot figure out the regex. My sample doesn't include newline but that could also happen

I've tried the basics like:

(. ?=. ?)

\s?([^\s] )

but these fail with space and newlines. I'm coding it also so can tidy up any leading/trailing characters where needed, I just rather do it in regex than scan one character at a time.

CodePudding user response:

You can use

([^\s=] )=([\w\W]*?)(?=\s [^\s=] =|$)

See the regex demo. Details:

  • ([^\s=] ) - Group 1: one or more chars other than whitespace and = char
  • = - a = char
  • ([\w\W]*?) - Group 2: any zero or more chars, as few as possible
  • (?=\s [^\s=] =|$) - a positive lookahead that requires one or more whitespaces followed with one or more chars other than whitespace and = followed with = or end of string immediately to the right of the current location.

A better idea to match any character instead of [\w\W] is by using a . and the singleline/dotall modifier (if supported, see How do I match any character across multiple lines in a regular expression?), here is an example:

(?s)([^\s=] )=(.*?)(?=\s [^\s=] =|$)
  • Related