Home > Enterprise >  python script for loop to print contents of files
python script for loop to print contents of files

Time:04-17

I need to make a Python script that creates a FASTA file containing all of records from all of the .fa files from /resources/pvalb/. The script should be able to take a list of files and output the contents of all of them (essentially mimicking the cat command from bash)

Here's what I have so far as an example:

> import sys
> 
> filenames = sys.argv[1:]
> 
> for filename in filenames:
>     for line in open(filename):
>         line = line.rstrip("\n")
>         print(line)

CodePudding user response:

Here are some things to consider for your script:

  • The files might contain binary data or an unrecognized text encoding
  • The files may be quite large
    • They may even be larger than the memory available on your system
    • It's even possible that one "line" would be larger than the memory on your system

To solve these problems, it's best to open the files in binary mode with 'b', and then read them in chunks.

Here's an example doing so in chunks of up to 4 KiB each:

import sys

for path in sys.argv[1:]:
    with open(path, 'rb') as file:
        while data := file.read(4096):
            sys.stdout.buffer.write(data)

Some more tips:

  • Most file systems, most SSD, and newer HDD all store data in 4 KiB chunks, making reading and writing in 4 KiB chunks usually a very efficient way to go
  • For very large files, especially in "slow" languages like Python, you can get faster performance by increasing these chunks to 1 MiB or even 10 MiB
  • 1 MiB and 10 MiB chunks are common on big, distributed file systems like Lustre and BeeGFS, though these file systems can have a wide variety of block sizes.
  • Related