python normalizing fixing string-CodePudding

I have a log file that contain unstructured entries like this

[roomID=19, description=ZZZZ, requesterCode=20, result=-1, errors=[[code=1, text=XXXXXXrequestID=/1540], flow=10:1] [remote=0.0.0.0, host=xxx]
[roomID=19, description=ZZZZ, requesterCode=20, result=-1, errors=[[code=2, text=XXXXXX., [code=2, text=XXXXXXrequestID=/1551], flow=12:3]

As you can see the 'errors' value starts with '[' but it does not has closing which makes it harder to parse

What I want to do is to clean only 'errors' part and fix it like this: Replcaing '[]' with '{}' and removing duplicate keys from errors so i can read it into python dict

[roomID=19, description=ZZZZ, requesterCode=20, result=-1, errors={code=1, text=XXXXXX, requestID=/1540}, flow=10:1] [remote=0.0.0.0, host=xxx]
[roomID=19, description=ZZZZ, requesterCode=20, result=-1, errors={code=2, text=XXXXXX., requestID=/1551}, flow=12:3]

I'm not good in python but I try with this poor code. I need your kind help to do it in efficient way.

def fix(str):
    str = str.replace('errors=[[', 'errors={')
    ..
    return str

Thank you very much

CodePudding user response：

Using the re from the standard library you can perform more complex text manipulations.

The important is to identify properly your patterns.

=[[ --> ={
[ -->
] --> }

Of course, they can be done more robusts.

import re

log = """roomID=19, description=ZZZZ, requesterCode=20, result=-1, errors=[[code=1, text=XXXXXXrequestID=/1540], flow=10:1
roomID=19, description=ZZZZ, requesterCode=20, result=-1, errors=[[code=2, text=XXXXXX., [code=2, text=XXXXXXrequestID=/1551], flow=12:3"""

mapping = {1: '={', 2: ' ', 3: '}'}
regex = r'(=\[\[)|(\s\[)|(\])' # r stands for raw string not for regex!

log_new = re.sub(regex, lambda match: mapping[match.lastindex], log)

print(log_new)

Answer to the EDITED question

log = """[roomID=19, description=ZZZZ, requesterCode=20, result=-1, errors=[[code=1, text=XXXXXXrequestID=/1540], flow=10:1] [remote=0.0.0.0, host=xxx]
[roomID=19, description=ZZZZ, requesterCode=20, result=-1, errors=[[code=2, text=XXXXXX., [code=2, text=XXXXXXrequestID=/1551], flow=12:3]"""

import re

mapping = {1: '', 2:'{code', 3: '}, '}
regex = r'(\[code=\d ,\s)|(\[ code)|(\],\s)'

log_new = re.sub(regex, lambda match: mapping[match.lastindex], log)

# split text into list
log_new = re.sub(r'(text=. ?)(requestID=)', r'\1, \2' , log_new)

print(log_new)