I have a log.txt
file like this,
2022-10-12 18:15:22.992 0026/? I/AsrDecActor25: channels=1, size=82434
2022-10-12 18:15:22.992 0026/? I/AsrDecActor25: waiting asr-core ready: 12 secs
2022-10-12 18:15:23.058 0199/? I/AsrDecActor27: asr state: START, true
2022-10-12 18:15:23.058 0199/? I/AsrDecActor27: asr state 2: START
2022-10-12 18:15:23.058 0199/? I/AsrDecActor27: end of decoding 57 true 0
NEC Input :secure folder app close it
NEC Replacement suggestion :Secure folder
NEC Input Before Replace : secure folder app close it
NEC Matching Word : secure folder app
Replaced Word : Secure folder
NEC Output After Replace : Secure folder close it
Changes : 1
2022-10-12 18:15:23.060 0199/? I/LangPackActor: eASR [NEC] Run completed, Time: 2 ms
PostProcessSubstitutions::Output of question mark processing: secure folder uninstall Kare
[eITN] Input:Secure folder uninstall kare OutputSecure folder uninstall Kare
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR [Timestamp] getTimestamp starts
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR string2IntegerList 14 20 23 32 36
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR string2IntegerList 14 20 23 32 36
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR levenshteinMapping
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR new ASRResult
2022-10-12 18:15:23.091 0021/? I/AsrDecActor26: decoding
Now I have written a code to extract the lines with beginning "NEC Input Before Replace" and "NEC Matching Word" such that my output.txt file looks like this
NEC Input Before Replace : secure folder app close it
NEC Matching Word : secure folder app
At first the code which I had written for the same was
#!/usr/bin/env python
f = open('log.txt')
f1 = open('output.txt', 'a')
doIHaveToCopyTheLine=False
for line in f.readlines():
if 'NEC Input Before Replace' in line:
doIHaveToCopyTheLine=True
elif 'NEC Matching Word' in line:
doIHaveToCopyTheLine=True
if doIHaveToCopyTheLine:
f1.write(line)
f1.close()
f.close()
which was throwing me this error
UnicodeDecodeError Traceback (most recent call last)
Input In [3], in <cell line: 7>()
3 f1 = open('output.txt', 'a')
5 doIHaveToCopyTheLine=False
----> 7 for line in f.readlines():
9 if 'NEC Input Before Replace' in line:
10 doIHaveToCopyTheLine=True
File D:\Anaconda\lib\encodings\cp1252.py:23, in IncrementalDecoder.decode(self, input, final)
22 def decode(self, input, final=False):
---> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 7524: character maps to <undefined>
So i changed the code to
#!/usr/bin/env python
f = open('log.txt','r',encoding='utf-8')
f1 = open('output.txt', 'a')
doIHaveToCopyTheLine=False
for line in f.readlines():
if 'NEC Input Before Replace' in line:
doIHaveToCopyTheLine=True
elif 'NEC Matching Word' in line:
doIHaveToCopyTheLine=True
if doIHaveToCopyTheLine:
f1.write(line)
f1.close()
f.close()
Though the file is opening currently but in the output I am getting this
NEC Input Before Replace : secure folder app close it
NEC Matching Word : secure folder app
Replaced Word : Secure folder
NEC Output After Replace : Secure folder close it
Changes : 1
2022-10-12 18:15:23.060 0199/? I/LangPackActor: eASR [NEC] Run completed, Time: 2 ms
PostProcessSubstitutions::Output of question mark processing: secure folder uninstall Kare
[eITN] Input:Secure folder uninstall kare OutputSecure folder uninstall Kare
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR [Timestamp] getTimestamp starts
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR string2IntegerList 14 20 23 32 36
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR string2IntegerList 14 20 23 32 36
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR levenshteinMapping
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR new ASRResult
2022-10-12 18:15:23.091 0021/? I/AsrDecActor26: decoding
All the lines after my desired lines are also getting printed. Anyone knows why is this happening and how to fix this issue?
CodePudding user response:
You reset doIHaveToCopyTheLine
to False
outside the loop, before you start iterating.
Instead, you need to do it inside the loop, at the beginning of every iteration.
But you may as well just do:
if 'NEC Input Before Replace' in line or 'NEC Matching Word' in line:
f1.write(line)