Home > front end >  Is there a faster way to extract lines from a file?
Is there a faster way to extract lines from a file?

Time:02-11

I have a set of files that I need to search through and extract certain lines. Right now, I'm using a for loop but this is proving costly in terms of time. Is there a faster way than the below?

import re

for file in files:
        localfile = open(file, 'r')
        for line in localfile:
                if re.search("Common English Words", line):
                      words = line.split("|")[0]
                      # Append words to file words.txt
                      open("words.txt","a ").write(words   "\n")

CodePudding user response:

Well for one thing, you are creating a new file descriptor every time that you write to the words.txt file. I ran some tests and found that python garbage collection does in fact close open file descriptors when they become inaccessible (at least in my test case). However, creating a file descriptor every time that you want to append to a file is going to be costly. For future reference, it is considered good practice to use with as blocks for opening files.

TLDR: One improvement you could make is to open the file you are writing to just once. Here is what that would look like:

import re

with open("words.txt","a ") as words_file:
    for file in files:
            localfile = open(file, 'r')
                for line in localfile:
                        if re.search("Common English Words", line):
                              words = line.split("|")[0]
                              # Append words to file words.txt
                              words_file.write(words   "\n")

Like I said, using with as statements when opening files is considered best practice. We can fully implement this best practice like so:

import re

with open("words.txt","a ") as words_file:
    for file in files:
            with open(file, 'r') as localfile:
                for line in localfile:
                        if re.search("Common English Words", line):
                              words = line.split("|")[0]
                              # Append words to file words.txt
                              words_file.write(words   "\n")
  • Related