I'm trying to read the raw contents of a binary file, so they can be manipulated in memory. As far as I understand, bytes()
objects are immutable, while bytearray()
objects are mutable, so I read the file into a bytearray and then try to modify the latter:
raw_data = bytearray()
try:
with open(input_file, "rb") as f:
raw_data = f.read()
except IOError:
print('Error opening', input_file)
raw_data[0] = 55 # attempt to modify the first byte
However this last line results in a TypeError: 'bytes' object does not support item assignment
.
Wait... what 'bytes' object?
Let's look into the actual data types reported by Python, before and after the array is populated:
raw_data = bytearray()
print('Before:', type(raw_data))
try:
with open(input_file, "rb") as f:
raw_data = f.read()
except IOError:
print('Error opening', input_file)
print('After: ', type(raw_data))
Output:
Before: <class 'bytearray'>
After: <class 'bytes'>
So what's going on here? Why is the type modified, and can I prevent it?
I can always create another bytearray object from the contents of raw_data
, but it'd be nice if I could save memory and just modify the original in place.
CodePudding user response:
Why is the type modified? Look at the following:
>>> x = 12
>>> type(x)
<class 'int'>
>>> x = 7.0
>>> type(x)
<class 'float'>
Sure, I assigned a value of 12 to x
and as a result x
had type int. But then I assigned a new value of 7.0 to x
and that changed the type of value that x
had. This is fundamental Python dynamic typing being demonstrated.
So it doesn't matter that you initially assigned a bytearray instance to raw_data
. What counts is the last assignment to raw_data
, which was:
raw_data = f.read()
And the call to f.read()
returns class bytes.
The way you get around this is by pre-allocating a bytearray with the correct size and using readinto
:
with open(input_file, mode="rb") as f:
# Seek to end of file and return offset from beginning:
file_size = f.seek(0, 2)
# Seek back to beginning:
f.seek(0, 0)
# Pre-alllocate bytearray:
raw_data = bytearray(file_size)
f.readinto(raw_data)