I am trying to concatenate multiple Numpy array's bytes into a single bytearray to send it in an HTTP post request.
The most efficient way of doing this, that I can think of, is to create a sufficiently large bytearray object and then write into it the bytes from all the numpy arrays contiguously.
The code will look something like this:
list_arr = [np.array([1,2,3]), np.array([4,5,6])]
total_nb_bytes = sum((a.nbytes for a in list_arr))
cb = bytearray(total_nb_bytes)
# Too Lazy Didn't do: generate list of delimiters and information to decode the concatenated bytes array
# concatenate the bytes
for arr in list_arr:
_bytes = arr.tobytes()
cb.extend(_bytes)
but the method tobytes()
isn't a zero-copy method it will copy the raw data of the numpy array into a bytes
object.
In python, buffers allow access to inner raw data value (this is called protocol buffer at the C level) Python documentation; numpy had this possibility in numpy1.13, the method was called getbuffer()
link. Yet, this method is deprecated!
What is the right way of doing this?
CodePudding user response:
Just use arr.data
. This returns a memoryview object which references the array’s memory without copying. It can be indexed and sliced (creating new memoryviews without copying) and appended to a bytearray (copying just once into the bytearray).
CodePudding user response:
You can make a numpy-compatible buffer out of your message bytearray
and write to that efficiently using np.concatenate
's out
argument.
list_arr = [np.array([1,2,3]), np.array([4,5,6])]
total_nb_bytes = sum(a.nbytes for a in list_arr)
total_size = sum(a.size for a in list_arr)
cb = bytearray(total_nb_bytes)
np.concatenate(list_arr, out=np.ndarray(total_size, dtype=list_arr[0].dtype, buffer=cb))
And sure enough,
>>> cb
bytearray(b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00\x05\x00\x00\x00\x06\x00\x00\x00')