Home > other >  Get key=value pairs from a string with regex
Get key=value pairs from a string with regex

Time:05-07

I have a string that looks something like:

x = """\
names=['m','c'],  \
nmodes=2,  \
mus=[[-5.0,  -5.0],  \
[5.0,  5.0]],  \
sigmas=[[1.5,  1.5],  [2.1,  2.1]],  \
corrcoefs=[[[1.0,  -0.7],  [-0.7,  1.0]],  [[1.0,  0.7],  [0.7,  1.0]]],  \
covs=[[[2.25,  -1.5749999999999997],  [-1.5749999999999997,  2.25]],  [[4.41,  3.087],  [3.087,  4.41]]],  \
weights=[1.0,  3.0],  \
bounds={'m': (-inf,  inf),  'c': (-inf,  inf)}\
"""

I want to split it up into key-value pairs using "=" as the separator and where each pair is separated by a comma ",".

I have tried the following using re:

import re

re.findall("(\S )=(\[.*\]$|{.*}$|\S )", x)

which gives:

[('names', "['m','c'],"),
 ('nmodes', '2,'),
 ('mus', '[[-5.0,'),
 ('sigmas', '[[1.5,'),
 ('corrcoefs', '[[[1.0,'),
 ('covs', '[[[2.25,'),
 ('weights', '[1.0,'),
 ('bounds', "{'m': (-inf,  inf),  'c': (-inf,  inf)}")]

but some of the lists get truncated. The output seems to change dependent on the number of spaces after the comma within lists, but I would like to have it work with arbitrary numbers of spaces after commas.

CodePudding user response:

You can use

re.findall(r'(\w )=(\[.*?]|{.*?}|\S )(?=\s*,\s*\w =|\Z)', text)

See the regex demo. Details:

  • (\w ) - Group 1: one or more word chars
  • = - a = char
  • (\[.*?]|{.*?}|\S ) - Group 2: [, any zero or more chars other than line break chars, as few as possible, ], or {, any zero or more chars other than line break chars, as few as possible, }, or one or more non-whitespace chars
  • (?=\s*,\s*\w =|\Z) - a positive lookahead that requirs a comma enclosed with zero or more whitespaces, one or more word chars, =, or end of string.
  • Related