Home > Net >  Correct use of Regex Match or Group Method
Correct use of Regex Match or Group Method

Time:12-07

I have a log in which I am trying to parse errors found in a log file into a csv file. The log looks like this:

[2019-Apr-01 13:00:02.343][Test][Info][2929:12][Writing.To.Log] Reading ABC.
[2019-Apr-01 13:00:03.343][Test][Alarm][8192:12][Test.In.Progress] Severity: Error0, Description: 'Information = ABC:1:ABC Value went over upper limit, Value = 93.0625 C, Error0UCL = 30 C.', ContactInfo: 555-555-5555
[2019-Apr-01 13:00:04.343][Test][Info][2929:12][Writing.To.Log] Reading DEF.
[2019-Apr-01 13:00:05.353][Test][Alarm][8193:12][Test.In.Progress] Severity: Error0, Description: 'Information = DEF:1:DEF Value went over upper limit, Value = 93.0625 C, Error0UCL = 30 C.', ContactInfo: 555-555-5555

So far I have the code like this:

pattern = r'\[([^]] )\]'
timePattern=r"\d{4}......."
ErrorLevelPattern= r'Error(\S ?)'
instrumentPattern= r'Information ={1}(.*?):{1}' #group1
valuePattern=r'Value ={1}(.*?),{1}'
descriptionPattern=r"('[^'] ')"

with open('Parsed.csv', 'w') as out_file:
    with open('Log.txt', 'r') as in_file:
        writer = csv.writer(out_file)
        writer.writerow(['Date', 'Time', 'ErrorLevel', 'InstrumentType', 'Value', 'ContactInfo', 'Description'])
        for line in in_file:
            if "Severity: Error" in line:
                print(re.findall(instrumentPattern,line))

I am attempting to create the output file as follows:

Date Time ErrorLevel InstrumentType Value ContactInfo Description
2019-Apr-01 13:00:03.343 0 ABC 93.0625 555-555-5555 Information = ABC:1:ABC Value went over upper limit, Value = 93.0625 C, Error0UCL = 30 C.
2019-Apr-01 13:00:05.353 0 DEF 93.0625 555-555-5555 Information = DEF:1:DEF Value went over upper limit, Value = 93.0625 C, Error0UCL = 30 C.

CodePudding user response:

It seems there is some confusion between quantifiers and groups. You should probably read more on the subject. You can simplify all your individual patterns and combine them in a global one. You can match descriptionPattern separately to keep the code simple:

datePattern=r"(\d{4}-[a-zA-Z]{3}-\d{2})"
timePattern=r"(\d{2}:\d{2}:\d{2}\.\d{3})"
ErrorLevelPattern= r'Error(\S ),'
instrumentPattern= r'Information = (.*?):'
valuePattern=r'Value = (\S )'
contact_pattern = r'ContactInfo: (\S )'

full_pattern = re.compile(rf"{datePattern} {timePattern}.*?{ErrorLevelPattern}.*?{instrumentPattern}.*?{valuePattern}.*?{contact_pattern}")

descriptionPattern=r"'([^'] )'"

with open('Parsed.csv', 'w', newline='') as out_file:
    with open('Log.txt', 'r') as in_file:
        writer = csv.writer(out_file)
        writer.writerow(['Date', 'Time', 'ErrorLevel', 'InstrumentType', 'Value', 'ContactInfo', 'Description'])
        for line in in_file:
            if "Severity: Error" in line:
                fields = list(*re.findall(full_pattern, line))
                fields.append(re.search(descriptionPattern,line).group(1))
                writer.writerow(fields)

Output:

Date,Time,ErrorLevel,InstrumentType,Value,ContactInfo,Description
2019-Apr-01,13:00:03.343,0,ABC,93.0625,555-555-5555,"Information = ABC:1:ABC Value went over upper limit, Value = 93.0625 C, Error0UCL = 30 C."
2019-Apr-01,13:00:05.353,0,DEF,93.0625,555-555-5555,"Information = DEF:1:DEF Value went over upper limit, Value = 93.0625 C, Error0UCL = 30 C."
  • Related