I have been trying to delete lines from a file without loading in memory all the file, because it's too large (~1Gb). How i do it without leaving a blank line in the file?
For example:
I want this
foo bar
this is the line to be removed
foo bar
foo bar
To this:
foo bar
foo bar
foo bar
But I get this:
foo bar
foo bar
foo bar
So I have managed to delete the line but I also want to remove the blank line. The way I did it so far is I move the file pointer (cursor) to the place i want and then with writing ' ' overwrite the line.
a = f.tell()
f.readline()
b = f.tell()
f.seek(a)
l2 = b-a-1
blank = " "*l2
f.write(blank)
f.seek(a)
CodePudding user response:
A much simpler approach to filtering a file in-place would be to open the same file twice, once for reading and another for writing, output only what needs to be kept, and truncate the output in the end. This way, none of tell
or seek
or any file position calculations would be needed:
with open('file.txt') as file, open('file.txt', 'r ') as output:
for line in file:
if line != 'this is the line to be removed\n':
output.write(line)
output.truncate()
Demo: https://replit.com/@blhsing/SeagreenSlushyAutoresponder
CodePudding user response:
If you do need to remove the lines in place, which can be fraught with danger, then you could try the following. Basically, it keeps track of the latest line read and the latest line written, and truncates from the end of the last line written once the input is exhausted. Please test before use!
with open('file.txt', 'r ') as f:
r_pos = w_pos = f.tell()
while True:
f.seek(r_pos)
line = f.readline()
if not line:
break
r_pos = f.tell()
if 'remove' not in line: # or your criteria
f.seek(w_pos)
f.write(line)
w_pos = f.tell()
f.seek(w_pos)
f.truncate()