Home > Blockchain >  python normalizing fixing string
python normalizing fixing string

Time:05-16

I have a log file that contain unstructured entries like this

[roomID=19, description=ZZZZ, requesterCode=20, result=-1, errors=[[code=1, text=XXXXXXrequestID=/1540], flow=10:1] [remote=0.0.0.0, host=xxx]
[roomID=19, description=ZZZZ, requesterCode=20, result=-1, errors=[[code=2, text=XXXXXX., [code=2, text=XXXXXXrequestID=/1551], flow=12:3]

As you can see the 'errors' value starts with '[' but it does not has closing which makes it harder to parse

What I want to do is to clean only 'errors' part and fix it like this: Replcaing '[]' with '{}' and removing duplicate keys from errors so i can read it into python dict

[roomID=19, description=ZZZZ, requesterCode=20, result=-1, errors={code=1, text=XXXXXX, requestID=/1540}, flow=10:1] [remote=0.0.0.0, host=xxx]
[roomID=19, description=ZZZZ, requesterCode=20, result=-1, errors={code=2, text=XXXXXX., requestID=/1551}, flow=12:3]

I'm not good in python but I try with this poor code. I need your kind help to do it in efficient way.

def fix(str):
    str = str.replace('errors=[[', 'errors={')
    ..
    return str

Thank you very much

CodePudding user response:

Using the re from the standard library you can perform more complex text manipulations.

The important is to identify properly your patterns.

  • =[[ --> ={
  • [ -->
  • ] --> }

Of course, they can be done more robusts.

import re

log = """roomID=19, description=ZZZZ, requesterCode=20, result=-1, errors=[[code=1, text=XXXXXXrequestID=/1540], flow=10:1
roomID=19, description=ZZZZ, requesterCode=20, result=-1, errors=[[code=2, text=XXXXXX., [code=2, text=XXXXXXrequestID=/1551], flow=12:3"""

mapping = {1: '={', 2: ' ', 3: '}'}
regex = r'(=\[\[)|(\s\[)|(\])' # r stands for raw string not for regex!

log_new = re.sub(regex, lambda match: mapping[match.lastindex], log)

print(log_new)

Answer to the EDITED question

log = """[roomID=19, description=ZZZZ, requesterCode=20, result=-1, errors=[[code=1, text=XXXXXXrequestID=/1540], flow=10:1] [remote=0.0.0.0, host=xxx]
[roomID=19, description=ZZZZ, requesterCode=20, result=-1, errors=[[code=2, text=XXXXXX., [code=2, text=XXXXXXrequestID=/1551], flow=12:3]"""

import re

mapping = {1: '', 2:'{code', 3: '}, '}
regex = r'(\[code=\d ,\s)|(\[ code)|(\],\s)'

log_new = re.sub(regex, lambda match: mapping[match.lastindex], log)

# split text into list
log_new = re.sub(r'(text=. ?)(requestID=)', r'\1, \2' , log_new)

print(log_new)
  • Related