I have a log file that contain unstructured entries like this
[roomID=19, description=ZZZZ, requesterCode=20, result=-1, errors=[[code=1, text=XXXXXXrequestID=/1540], flow=10:1] [remote=0.0.0.0, host=xxx]
[roomID=19, description=ZZZZ, requesterCode=20, result=-1, errors=[[code=2, text=XXXXXX., [code=2, text=XXXXXXrequestID=/1551], flow=12:3]
As you can see the 'errors' value starts with '[' but it does not has closing which makes it harder to parse
What I want to do is to clean only 'errors' part and fix it like this: Replcaing '[]' with '{}' and removing duplicate keys from errors so i can read it into python dict
[roomID=19, description=ZZZZ, requesterCode=20, result=-1, errors={code=1, text=XXXXXX, requestID=/1540}, flow=10:1] [remote=0.0.0.0, host=xxx]
[roomID=19, description=ZZZZ, requesterCode=20, result=-1, errors={code=2, text=XXXXXX., requestID=/1551}, flow=12:3]
I'm not good in python but I try with this poor code. I need your kind help to do it in efficient way.
def fix(str):
str = str.replace('errors=[[', 'errors={')
..
return str
Thank you very much
CodePudding user response:
Using the re
from the standard library you can perform more complex text manipulations.
The important is to identify properly your patterns.
=[[
-->={
[
-->]
-->}
Of course, they can be done more robusts.
import re
log = """roomID=19, description=ZZZZ, requesterCode=20, result=-1, errors=[[code=1, text=XXXXXXrequestID=/1540], flow=10:1
roomID=19, description=ZZZZ, requesterCode=20, result=-1, errors=[[code=2, text=XXXXXX., [code=2, text=XXXXXXrequestID=/1551], flow=12:3"""
mapping = {1: '={', 2: ' ', 3: '}'}
regex = r'(=\[\[)|(\s\[)|(\])' # r stands for raw string not for regex!
log_new = re.sub(regex, lambda match: mapping[match.lastindex], log)
print(log_new)
Answer to the EDITED question
log = """[roomID=19, description=ZZZZ, requesterCode=20, result=-1, errors=[[code=1, text=XXXXXXrequestID=/1540], flow=10:1] [remote=0.0.0.0, host=xxx]
[roomID=19, description=ZZZZ, requesterCode=20, result=-1, errors=[[code=2, text=XXXXXX., [code=2, text=XXXXXXrequestID=/1551], flow=12:3]"""
import re
mapping = {1: '', 2:'{code', 3: '}, '}
regex = r'(\[code=\d ,\s)|(\[ code)|(\],\s)'
log_new = re.sub(regex, lambda match: mapping[match.lastindex], log)
# split text into list
log_new = re.sub(r'(text=. ?)(requestID=)', r'\1, \2' , log_new)
print(log_new)