I have a set of files that I need to search through and extract certain lines. Right now, I'm using a for
loop but this is proving costly in terms of time. Is there a faster way than the below?
import re
for file in files:
localfile = open(file, 'r')
for line in localfile:
if re.search("Common English Words", line):
words = line.split("|")[0]
# Append words to file words.txt
open("words.txt","a ").write(words "\n")
CodePudding user response:
Well for one thing, you are creating a new file descriptor every time that you write to the words.txt file. I ran some tests and found that python garbage collection does in fact close open file descriptors when they become inaccessible (at least in my test case). However, creating a file descriptor every time that you want to append to a file is going to be costly. For future reference, it is considered good practice to use with as blocks for opening files.
TLDR: One improvement you could make is to open the file you are writing to just once. Here is what that would look like:
import re
with open("words.txt","a ") as words_file:
for file in files:
localfile = open(file, 'r')
for line in localfile:
if re.search("Common English Words", line):
words = line.split("|")[0]
# Append words to file words.txt
words_file.write(words "\n")
Like I said, using with as statements when opening files is considered best practice. We can fully implement this best practice like so:
import re
with open("words.txt","a ") as words_file:
for file in files:
with open(file, 'r') as localfile:
for line in localfile:
if re.search("Common English Words", line):
words = line.split("|")[0]
# Append words to file words.txt
words_file.write(words "\n")