Home > other >  How to add to the beginning of each line of a large file (>100GB) the index of that line with Pyt
How to add to the beginning of each line of a large file (>100GB) the index of that line with Pyt

Time:05-29

some_file.txt: (berore)

one
two
three
four
five
...

How can I effectively modify large file in Python?

with open("some_file.txt", "r ") as file:
    for idx, line in enumerate(file.readlines()):
        file.writeline(f'{idx} {line}') # something like this

some_file.txt: (after)

1 one
2 two
3 three
4 four
5 five
...

CodePudding user response:

Don't try to load your entire file in memory, because the file may be too large for that. Instead, read line by line:

with open('input.txt') as inp, open('output.txt', 'w') as out:
    idx = 1
    for line in inp:
        out.write(f'{idx} {line}'
        idx  = 1

You can't insert into the middle of a file without re-writing it. This is an operating system thing, not a Python thing.

CodePudding user response:

Use pathlib for path manipulation. Rename the original file. Then copy it to a new file, adding the line numbers as you go. Keep the old file until you verify the new file is correct.

Open files are iterable, so you can use enumerate() on them directly without having to use readlines() first. The second argument to enumerate() is the number to start the count with. So the loop below will number the lines starting with 1.

from pathlib import Path

target = Path("some_file.txt")

# rename the file with ".old" suffix
original = target.rename(target.with_suffix(".old"))

with original.open("r") as source, target.open("w") as sink:
    for line_no, line in enumerate(source, 1):
        sink.writeline(f'{line_no} {line}')
  • Related