I have a string that looks something like:
x = """\
names=['m','c'], \
nmodes=2, \
mus=[[-5.0, -5.0], \
[5.0, 5.0]], \
sigmas=[[1.5, 1.5], [2.1, 2.1]], \
corrcoefs=[[[1.0, -0.7], [-0.7, 1.0]], [[1.0, 0.7], [0.7, 1.0]]], \
covs=[[[2.25, -1.5749999999999997], [-1.5749999999999997, 2.25]], [[4.41, 3.087], [3.087, 4.41]]], \
weights=[1.0, 3.0], \
bounds={'m': (-inf, inf), 'c': (-inf, inf)}\
"""
I want to split it up into key-value pairs using "=" as the separator and where each pair is separated by a comma ","
.
I have tried the following using re:
import re
re.findall("(\S )=(\[.*\]$|{.*}$|\S )", x)
which gives:
[('names', "['m','c'],"),
('nmodes', '2,'),
('mus', '[[-5.0,'),
('sigmas', '[[1.5,'),
('corrcoefs', '[[[1.0,'),
('covs', '[[[2.25,'),
('weights', '[1.0,'),
('bounds', "{'m': (-inf, inf), 'c': (-inf, inf)}")]
but some of the lists get truncated. The output seems to change dependent on the number of spaces after the comma within lists, but I would like to have it work with arbitrary numbers of spaces after commas.
CodePudding user response:
You can use
re.findall(r'(\w )=(\[.*?]|{.*?}|\S )(?=\s*,\s*\w =|\Z)', text)
See the regex demo. Details:
(\w )
- Group 1: one or more word chars=
- a=
char(\[.*?]|{.*?}|\S )
- Group 2:[
, any zero or more chars other than line break chars, as few as possible,]
, or{
, any zero or more chars other than line break chars, as few as possible,}
, or one or more non-whitespace chars(?=\s*,\s*\w =|\Z)
- a positive lookahead that requirs a comma enclosed with zero or more whitespaces, one or more word chars,=
, or end of string.