How to get line index of a phase in file-CodePudding

I want to find the line index of a phase in a text file. So I wrote this function:

def findLineIndex(file, phase, minLineNum=-1):
    for lineNum, line in enumerate(file):
        if phase in line:
            if(lineNum > minLineNum):
                return lineNum

    return -1

Which works just fine. I have a text file call idk.txt:

Using the function to find the line number of "3" works fine

textFile = open("idk.text") # all the file is in the same folder
print(findLineIndex(textFile, "3"))
# output: 3

So here is the problem, I have this code

textFile = open("idk.text") # all the file is in the same folder
print(findLineIndex(textFile, "3"))
# output 3

print(findLineIndex(textFile, "6"))
# output 2

The output is 3 and 2 but it supposed to be 3 and 6. Running the program in debug mode shows me that it continues reading the file where the last function left off and setting the lineNum back to 0. And running more findLineIndex functions it just start reading where the last one left off. I have no idea why this happens, can someone please help me?

CodePudding user response：

As you have noticed, by passing the same textFile object to findLineIndex twice, it just continues where it left off the previous time, because it is an iterator over the lines in the file.

This means that you can't ever find a line that comes before any line that you have found before.

Depending on your constraints there are several options to solve this:

You only want to find lines in the order in which they appear in the file

If you do not need to look back to lines you have already searched before, you can keep the iterator over the lines in the file, but you need to ensure that findFileIndex does not start counting from 0 each time, but instead from the previous count.

The easiest way to achieve this would be to call enumerate outside the function and pass this new iterator to findFileIndex.

def findLineIndex(enumerated_lines, phase, minLineNum=-1):
    for lineNum, line in enumerated_lines:  # do not call enumerate here
        # ... same as before ...

textFile = open("idk.text")
enumerated_lines = enumerate(textFile)  # instead call it here

print(findLineIndex(enumerated_lines, "3"))
# output 3

print(findLineIndex(enumerated_lines, "6"))
# output 6

print(findLineIndex(enumerated_lines, "3"))
# output -1, "3" was already found before

You want to find lines in any order and memory is not an issue

Store the lines in a list before searching in them. This way the search begins from the beginning each time.

textFile = open("idk.text")
text_lines = textFile.readlines()  # a list

print(findLineIndex(text_lines, "3"))
# output 3

print(findLineIndex(text_lines, "6"))
# output 6

print(findLineIndex(text_lines, "3"))
# output 3, would not have worked before

You want to find lines in any order but the file is too big to load it into memory at once

The simplest solution would be to reset the file iterator each time findLineIndex is called. To do this, call its seek method (see Python reset line in for loop).

textFile = open("idk.text")

print(findLineIndex(textFile, "3"))
# output 3

textFile.seek(0)
print(findLineIndex(textFile, "6"))
# output 6

textFile.seek(0)
print(findLineIndex(textFile, "3"))
# output 3

A more advanced and efficient solution would be to load the file contents into a database (e.g. using SQLite) which would allow to search it randomly without loading it into memory at once.

CodePudding user response：

I found the solution, as @mkrieger1 said:

As you have noticed, by passing the same textFile object to findLineIndex twice, it just continues where it left off the previous time, because it is an iterator over the lines in the file.

So a solution is just to create open the same file on calling the function which in turn resetting the file. So the new function look like this:

def findLineIndex(fileDir: str, phase, minLineNum=-1):
    file = open(fileDir)
    for lineNum, line in enumerate(file):
        if phase in line:
            if(lineNum > minLineNum):
                return lineNum

    return -1

and the textFile just become the text file directory. So textFile = "idk.txt"