Home > Net >  How to modify and overwrite large files?
How to modify and overwrite large files?

Time:10-30

I want to make several modifications to some lines in the file and overwrite the file. I do not want to create a new file with the changes, and since the file is large (hundreds of MB), I don't want to read it all at once in memory.

datfile = 'C:/some_path/text.txt'

with open(datfile) as file:
    for line in file:
        if line.split()[0] == 'TABLE':
            # if this is true, I want to change the second word of the line
            # something like: line.split()[1] = 'new'

Please note that an important part of the problem is that the file is big. There are several solutions on the site that address the similar problems but do not account for the size of the files.

Is there a way to do this in python?

CodePudding user response:

You can't replace the contents of a portion of a file without rewriting the remainder of the file regardless of python. Each byte of a file lives in a fixed location on a disk or flash memory. If you want to insert text into the file that is shorter or longer than the text it replaces, you will need to move the remainder of the file. If your replacement is longer than the original text, you will probably want to write a new file to avoid overwriting the data.

Given how file I/O works, and the operations you are already performing on the file, making a new file will not be as big of a problem as you think. You are already reading in the entire file line-by-line and parsing the content. Doing a buffered write of the replacement data will not be all that expensive.

from tempfile import NamedTemporaryFile
from os import remove, rename
from os.path import dirname

datfile = 'C:/some_path/text.txt'

try:
    with open(datfile) as file, NamedTemporaryFile(mode='wt', dir=dirname(datfile), delete=False) as output:
        tname = output.name
        for line in file:
            if line.startswith('TABLE'):
                ls = line.split()
                ls[1] = 'new'
                line = ls.join(' ')   '\n'
            output.write(line)
except:
    remove(tname)
else:
    rename(tname, datfile)

Passing dir=dirname(datfile) to NamedTemporaryFile should guarantee that the final rename does not have to copy the file from one disk to another in most cases. Using delete=False allows you to do the rename if the operation succeeds. The temporary file is deleted by name if any problem occurs, and renamed to the original file otherwise.

  • Related