I want to find the line index of a phase in a text file. So I wrote this function:
def findLineIndex(file, phase, minLineNum=-1):
for lineNum, line in enumerate(file):
if phase in line:
if(lineNum > minLineNum):
return lineNum
return -1
Which works just fine. I have a text file call idk.txt
:
0
1
2
3
4
5
6
7
8
9
10
Using the function to find the line number of "3" works fine
textFile = open("idk.text") # all the file is in the same folder
print(findLineIndex(textFile, "3"))
# output: 3
So here is the problem, I have this code
textFile = open("idk.text") # all the file is in the same folder
print(findLineIndex(textFile, "3"))
# output 3
print(findLineIndex(textFile, "6"))
# output 2
The output is 3 and 2 but it supposed to be 3 and 6. Running the program in debug mode shows me that it continues reading the file where the last function left off and setting the lineNum
back to 0. And running more findLineIndex
functions it just start reading where the last one left off. I have no idea why this happens, can someone please help me?
CodePudding user response:
As you have noticed, by passing the same textFile
object to findLineIndex
twice, it just continues where it left off the previous time, because it is an iterator over the lines in the file.
This means that you can't ever find a line that comes before any line that you have found before.
Depending on your constraints there are several options to solve this:
You only want to find lines in the order in which they appear in the file
If you do not need to look back to lines you have already searched before, you can keep the iterator over the lines in the file, but you need to ensure that findFileIndex
does not start counting from 0 each time, but instead from the previous count.
The easiest way to achieve this would be to call enumerate
outside the function and pass this new iterator to findFileIndex
.
def findLineIndex(enumerated_lines, phase, minLineNum=-1):
for lineNum, line in enumerated_lines: # do not call enumerate here
# ... same as before ...
textFile = open("idk.text")
enumerated_lines = enumerate(textFile) # instead call it here
print(findLineIndex(enumerated_lines, "3"))
# output 3
print(findLineIndex(enumerated_lines, "6"))
# output 6
print(findLineIndex(enumerated_lines, "3"))
# output -1, "3" was already found before
You want to find lines in any order and memory is not an issue
Store the lines in a list before searching in them. This way the search begins from the beginning each time.
textFile = open("idk.text")
text_lines = textFile.readlines() # a list
print(findLineIndex(text_lines, "3"))
# output 3
print(findLineIndex(text_lines, "6"))
# output 6
print(findLineIndex(text_lines, "3"))
# output 3, would not have worked before
You want to find lines in any order but the file is too big to load it into memory at once
The simplest solution would be to reset the file iterator each time findLineIndex
is called. To do this, call its seek
method (see Python reset line in for loop).
textFile = open("idk.text")
print(findLineIndex(textFile, "3"))
# output 3
textFile.seek(0)
print(findLineIndex(textFile, "6"))
# output 6
textFile.seek(0)
print(findLineIndex(textFile, "3"))
# output 3
A more advanced and efficient solution would be to load the file contents into a database (e.g. using SQLite) which would allow to search it randomly without loading it into memory at once.
CodePudding user response:
I found the solution, as @mkrieger1 said:
As you have noticed, by passing the same textFile object to findLineIndex twice, it just continues where it left off the previous time, because it is an iterator over the lines in the file.
So a solution is just to create open the same file on calling the function which in turn resetting the file. So the new function look like this:
def findLineIndex(fileDir: str, phase, minLineNum=-1):
file = open(fileDir)
for lineNum, line in enumerate(file):
if phase in line:
if(lineNum > minLineNum):
return lineNum
return -1
and the textFile
just become the text file directory. So textFile = "idk.txt"