I have a sample text file having data like this
<iden/><provider></provider><trace>065110d4-cec5-d433772ed57a</trace>
<ServiceRQ>Some xml data</ServiceRQ>
<iden/><provider></provider>
<ServiceRQ>Some xml data</ServiceRQ>
like this and so on it is quite a large file.
I want to check the odd line if <trace>065110d4-cec5-43f9-b089-d433772ed57a</trace>
is present that replace it with <trace>xyz</trace>
else if <trace>065110d4-cec5-43f9-b089-d433772ed57a</trace>
is not present then add <trace>xyz</trace>
my code:
with open("Sample_xml.txt", 'r') as fp:
output = fp.readlines()
type(output)
s = len(output) - 1
tc = 0
rq = 1
while (tc <= s) and (rq <= s):
if tc % 2 == 0:
a = (output[tc])
if a.find("<trace") != -1:
a = re.sub('(?<=<trace>)(.*?)(?=</trace>)','xyz', a)
print(a)
elif a.find("<trace>") == -1:
a = a.rstrip() '<trace>xyz</trace>' '\n'
print(a)
if rq % 2 != 0:
b = (output[rq])
print(b)
with open("Fin_xml.txt", "a") as myfile:
myfile.write(a)
myfile.write(b '\n')
tc = 2
rq = 2
the file is so large greater that 800mb so this code is not working properly with readlines(). Please can someone help me with my code
CodePudding user response:
If I understand correctly, you don't want to read entire file (But I don't think that 800mb is that much).
In this case instead of fp.readlines()
you can use next(fp)
and read each line one by one. Do it twice if you want to skip line.
Also take a look - read large files in python
CodePudding user response:
As @Archili Robakidze pointed out you could read a large file line by line. This will ensure only the one line is kept in memory.
Also you could simplify the code. Use enumerate
to get the line number. Since you are starting with the very first line in the file as line 1
instead of line 0
use enumerate(fp, 1)
:
import re
with open("Sample_xml.txt", 'r') as fp:
for i, line in enumerate(fp, 1):
# If the line number is odd, do the check and replace
if i%2 != 0:
if line.find("<trace") != -1:
line = re.sub('(?<=<trace>)(.*?)(?=</trace>)','xyz', line)
print(line)
elif line.find("<trace>") == -1:
line = line.rstrip() '<trace>xyz</trace>' '\n'
print(line)
with open("Fin_xml.txt", "a") as myfile:
myfile.write(line)