Home > database >  Read nth line of importlib.resources.files without loading whole file in memory
Read nth line of importlib.resources.files without loading whole file in memory

Time:01-10

I thought I would be able to use a importlib.resources.files(...).open() object just like a file:

from importlib.resources import files


with files('data').joinpath('graphics.txt').open() as graphics:
    for i, line in graphics:
        if i == 4:
            print(line)

but I get a ValueError: too many values to unpack (expected 2)

In this kind of project structure of course:

.
├── __init__.py
└── data
    ├── __init__.py (empty)
    └── graphics.txt

sample graphics.txt:

line 0
line 1
line 2
line 3
line 4
line 5
line 6

I want to read a single line I know the number of, wasting minimal time getting to that line, and without loading the whole file into memory (it is quite large, but not too large)

CodePudding user response:

It works just fine, you just forgot enumerate:

for i, line in graphics:

should be:

for i, line in enumerate(graphics):

Without enumerate, you're only getting the lines, not the line number.

If you want a small micro-optimization, you can use itertools.islice to avoid explicitly looping over and checking line numbers for the lines you don't care about:

from itertools import islice  # At top of file

with files('data').joinpath('graphics.txt').open() as graphics:
    line = next(islice(graphics, 4, None), None)
    if line is not None:
        print(line)

But to be clear, the preceding lines are still being read (and immediately discarded) under the hood; you can't skip variable lengths lines without scanning from the beginning.

CodePudding user response:

Change this:

for i, line in graphics:

Into this:

for i, line in enumerate(graphics, start=1):

CodePudding user response:

The original code is giving a "too many values to unpack" error because the for loop is unpacking two variables (i and line) from each iteration of the loop, but the object being iterated over (the file object) does not contain elements that can be unpacked into two variables. Instead, the file object yields each line of the file as a string when it is iterated over.

To fix this error, you can use the enumerate() function to loop over the lines of the file and keep track of the current line number. This allows us to unpack the line number (i) and the line (line) from each iteration of the loop, and use them to check if the current line is the fifth line and print it if it is.

Try this:

from importlib.resources import files

# Open the file using a context manager
with files('data').joinpath('graphics.txt').open() as graphics:
    for i, line in enumerate(graphics):
        if i == 4:
            print(line)
  • Related