Home > Software design >  Read byte array DIRECTLY from file
Read byte array DIRECTLY from file

Time:08-20

I want to get a byte array from a big file, but when I use bytearray(), I'm briefly using double the RAM, which is an issue when I don't have much RAM.

I here have an example that illustrates the issue. So my question is "How do I directly get a bytearray from a file?"

0.5 GB buffers:

from io import BytesIO
mb = 1024 * 1024
gb = 1024 * mb
size = 512 * mb
file = BytesIO(b"\0" * size)
memory = 0
files = []
while True:
    file.seek(0)
    data = file.read()
    memory  = size
    print("RAM usage:  %4.1f GB" % (memory / gb))
    data = bytearray(data)
    print("RAM usage*: %4.1f GB" % (memory / gb))
    files.append(data)

Output:

RAM usage:   0.5 GB
RAM usage*:  0.5 GB
RAM usage:   1.0 GB
RAM usage*:  1.0 GB
RAM usage:   1.5 GB
Killed

[Program finished]

1 GB Buffers

...
size = 1 * gb
...

Output

RAM usage:   1.0 GB
Killed

[Program finished]

CodePudding user response:

You can read the file in raw mode and use the readinto method to read the file directly into your pre-allocated bytearray object without consuming more memory.

For example:

import os

buffer = bytearray(os.path.getsize(__file__))
with open(__file__, 'rb') as file:
    file.raw.readinto(buffer)

print(buffer)

outputs:

bytearray(b"import os\n\nbuffer = bytearray(os.path.getsize(__file__))\nwith open(__file__, \'rb\') as file:\n\tfile.raw.readinto(buffer)\n\nprint(buffer)")

Demo: https://replit.com/@blhsing/ExternalAngryMenus

CodePudding user response:

So, the code becomes this after the code suggestion of @blhsing. Problem solved. :)

from os.path import getsize
mb = 1024 * 1024
gb = 1024 * mb
open("test.file", "wb").write(b"\0" * (512 * mb))
file = open("test.file", "rb")
memory = 0
files = []
while True:
    file.seek(0)
    buffer = bytearray(getsize("test.file"))
    memory  = getsize("test.file")
    print("RAM usage: %4.1f GB" % (memory / gb))
    file.readinto(buffer)
    print("RAM usage*: %4.1f GB" % (memory / gb))
    files.append(buffer)

Output:

RAM usage:   0.5 GB
RAM usage*:  0.5 GB
RAM usage:   1.0 GB
RAM usage*:  1.0 GB
RAM usage:   1.5 GB
RAM usage*:  1.5 GB
Killed

[Program finished]
  • Related