Home > Back-end >  Extracting data from a log file
Extracting data from a log file

Time:10-13

I have a log.txt file like this,

2022-10-12 18:15:22.992 0026/? I/AsrDecActor25: channels=1, size=82434
2022-10-12 18:15:22.992 0026/? I/AsrDecActor25: waiting asr-core ready: 12 secs
2022-10-12 18:15:23.058 0199/? I/AsrDecActor27: asr state: START, true
2022-10-12 18:15:23.058 0199/? I/AsrDecActor27: asr state 2: START
2022-10-12 18:15:23.058 0199/? I/AsrDecActor27: end of decoding 57 true 0
NEC Input :secure folder app close it
NEC Replacement suggestion :Secure folder
NEC Input Before Replace : secure folder app close it
NEC Matching Word : secure folder app
Replaced Word  : Secure folder
NEC Output After Replace : Secure folder close it
Changes : 1
2022-10-12 18:15:23.060 0199/? I/LangPackActor: eASR [NEC] Run completed, Time: 2 ms
PostProcessSubstitutions::Output of question mark processing: secure folder uninstall Kare
[eITN] Input:Secure folder uninstall kare OutputSecure folder uninstall Kare
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR [Timestamp] getTimestamp starts
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR string2IntegerList 14 20 23 32 36 
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR string2IntegerList 14 20 23 32 36 
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR levenshteinMapping
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR new ASRResult
2022-10-12 18:15:23.091 0021/? I/AsrDecActor26: decoding 

Now I have written a code to extract the lines with beginning "NEC Input Before Replace" and "NEC Matching Word" such that my output.txt file looks like this

NEC Input Before Replace : secure folder app close it
NEC Matching Word : secure folder app

At first the code which I had written for the same was

#!/usr/bin/env python
f = open('log.txt')
f1 = open('output.txt', 'a')

doIHaveToCopyTheLine=False

for line in f.readlines():

    if 'NEC Input Before Replace' in line:
        doIHaveToCopyTheLine=True
    elif 'NEC Matching Word' in line:
        doIHaveToCopyTheLine=True

    if doIHaveToCopyTheLine:
        f1.write(line)

f1.close()
f.close()

which was throwing me this error

UnicodeDecodeError                        Traceback (most recent call last)
Input In [3], in <cell line: 7>()
      3 f1 = open('output.txt', 'a')
      5 doIHaveToCopyTheLine=False
----> 7 for line in f.readlines():
      9     if 'NEC Input Before Replace' in line:
     10         doIHaveToCopyTheLine=True

File D:\Anaconda\lib\encodings\cp1252.py:23, in IncrementalDecoder.decode(self, input, final)
     22 def decode(self, input, final=False):
---> 23     return codecs.charmap_decode(input,self.errors,decoding_table)[0]

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 7524: character maps to <undefined>

So i changed the code to

#!/usr/bin/env python
f = open('log.txt','r',encoding='utf-8')
f1 = open('output.txt', 'a')

doIHaveToCopyTheLine=False

for line in f.readlines():

    if 'NEC Input Before Replace' in line:
        doIHaveToCopyTheLine=True
    elif 'NEC Matching Word' in line:
        doIHaveToCopyTheLine=True

    if doIHaveToCopyTheLine:
        f1.write(line)

f1.close()
f.close()

Though the file is opening currently but in the output I am getting this

NEC Input Before Replace : secure folder app close it
NEC Matching Word : secure folder app
Replaced Word  : Secure folder
NEC Output After Replace : Secure folder close it
Changes : 1
2022-10-12 18:15:23.060 0199/? I/LangPackActor: eASR [NEC] Run completed, Time: 2 ms
PostProcessSubstitutions::Output of question mark processing: secure folder uninstall Kare
[eITN] Input:Secure folder uninstall kare OutputSecure folder uninstall Kare
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR [Timestamp] getTimestamp starts
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR string2IntegerList 14 20 23 32 36 
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR string2IntegerList 14 20 23 32 36 
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR levenshteinMapping
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR new ASRResult
2022-10-12 18:15:23.091 0021/? I/AsrDecActor26: decoding 

All the lines after my desired lines are also getting printed. Anyone knows why is this happening and how to fix this issue?

CodePudding user response:

You reset doIHaveToCopyTheLine to False outside the loop, before you start iterating.

Instead, you need to do it inside the loop, at the beginning of every iteration.

But you may as well just do:

if 'NEC Input Before Replace' in line or 'NEC Matching Word' in line:
    f1.write(line)
  • Related