I want to get a byte array from a big file, but when I use bytearray()
, I'm briefly using double the RAM, which is an issue when I don't have much RAM.
I here have an example that illustrates the issue. So my question is "How do I directly get a bytearray from a file?"
0.5 GB buffers:
from io import BytesIO
mb = 1024 * 1024
gb = 1024 * mb
size = 512 * mb
file = BytesIO(b"\0" * size)
memory = 0
files = []
while True:
file.seek(0)
data = file.read()
memory = size
print("RAM usage: %4.1f GB" % (memory / gb))
data = bytearray(data)
print("RAM usage*: %4.1f GB" % (memory / gb))
files.append(data)
Output:
RAM usage: 0.5 GB
RAM usage*: 0.5 GB
RAM usage: 1.0 GB
RAM usage*: 1.0 GB
RAM usage: 1.5 GB
Killed
[Program finished]
1 GB Buffers
...
size = 1 * gb
...
Output
RAM usage: 1.0 GB
Killed
[Program finished]
CodePudding user response:
You can read the file in raw mode and use the readinto
method to read the file directly into your pre-allocated bytearray object without consuming more memory.
For example:
import os
buffer = bytearray(os.path.getsize(__file__))
with open(__file__, 'rb') as file:
file.raw.readinto(buffer)
print(buffer)
outputs:
bytearray(b"import os\n\nbuffer = bytearray(os.path.getsize(__file__))\nwith open(__file__, \'rb\') as file:\n\tfile.raw.readinto(buffer)\n\nprint(buffer)")
Demo: https://replit.com/@blhsing/ExternalAngryMenus
CodePudding user response:
So, the code becomes this after the code suggestion of @blhsing. Problem solved. :)
from os.path import getsize
mb = 1024 * 1024
gb = 1024 * mb
open("test.file", "wb").write(b"\0" * (512 * mb))
file = open("test.file", "rb")
memory = 0
files = []
while True:
file.seek(0)
buffer = bytearray(getsize("test.file"))
memory = getsize("test.file")
print("RAM usage: %4.1f GB" % (memory / gb))
file.readinto(buffer)
print("RAM usage*: %4.1f GB" % (memory / gb))
files.append(buffer)
Output:
RAM usage: 0.5 GB
RAM usage*: 0.5 GB
RAM usage: 1.0 GB
RAM usage*: 1.0 GB
RAM usage: 1.5 GB
RAM usage*: 1.5 GB
Killed
[Program finished]