Home > database >  Data loss when Python reads a 13mb file
Data loss when Python reads a 13mb file

Time:10-27

def get_txt(path):
    with open(path, 'r', encoding='utf-8') as file:
        while file.readline():
            print(file.readline())

if __name__ == '__main__':
    path = 'data/data.html'
    get_txt(path)

This is my code, which prints the data of the source file line by line, and prints it on the console, but when I use Ctrl F to search in the console, I don't find the data I want. It has read the data, but I don't know from which part it started reading, the data is missing

My file is an html file with a size of 13MB. The first line of data printed on the console is not:

<!DOCTYPE html>

, but the data on the first line of my source file is this. The last line prints:

</html>

this is reasonable. I've tried searching with Ctrl F, but the results are always unexpected.

CodePudding user response:

presumably

def get_txt(path):
    with open(path, 'r', encoding='utf-8') as f:
        for line in f:
            print(line)

this will print every line in the file without skipping any... whereas before you would only print every other line

CodePudding user response:

    while file.readline():
        print(file.readline())

The while condition reads a line, and then the print() reads the next line and prints it.

You're only printing every other line.

Try this instead:

for line in file:
    print(line)

CodePudding user response:

The below code is slightly adjusted to work, you can replace the code you posted with this code and it should work as intended

def get_txt(path_):
    with open(path_, 'r', encoding='utf-8') as file:
        for line in file.readlines():
            print(line)


if __name__ == '__main__':
    path = 'data/data.html'
    get_txt(path)

CodePudding user response:

(Posted on behalf of the question author, to move the answer to the answers section).

The problem has been solved, and the configuration of the PyCharm/bin/idea.properties file has been modified: idea.cycle.buffer.size=disabled.

  • Related