Given that the width in bytes for rows in numpy array and the sum width of fields in a structure defined by dtype are the same, is there a simple way to convert such numpy array to a structured array?
For example, my_type
defines a data type with 5 bytes per data element in all fields: [('checksum','u2'), ('word', 'B', (3,))]
. Then I want to convert the numpy array [[ 1 2 3 4 5] [ 11 12 13 14 15]]
to the structured array [( 258, [ 3, 4, 5]) (2828, [13, 14, 15])]
.
My initial attemp was this:
import numpy as np
from random import randint
# generate data
array = np.array([(1,2,3,4,5),
(11,12,13,14,15)], dtype = np.uint8)
# format data
my_type = np.dtype([('checksum','u2'), ('word', 'B', (3,))])
structured_array = np.array([array], dtype=my_type)
But, as expected, because of numpy broadcasting rules, I get the following:
[[[( 1, [ 1, 1, 1]) ( 2, [ 2, 2, 2]) ( 3, [ 3, 3, 3])
( 4, [ 4, 4, 4]) ( 5, [ 5, 5, 5])]
[( 11, [ 11, 11, 11]) (12, [12, 12, 12]) (13, [13, 13, 13])
(14, [14, 14, 14]) (15, [15, 15, 15])]]]
My current not-so-elegant solution is to loop through the rows of an array and map them to the structure:
structured_array = np.zeros(array.shape[0], dtype=my_type)
for idx, row in enumerate(array):
for key, value in my_type.fields.items():
b = row[value[1]:value[1] value[0].itemsize]
if len(structured_array[idx][key].shape):
structured_array[idx][key] = b
else:
structured_array[idx][key] = int.from_bytes(b, byteorder='big', signed=False)
So the question is whether there is a simple, one-line solution to perform this task for an arbitrary data type of a structured array, without parsing bytes of a numpy array?
CodePudding user response:
In [222]: x = np.array([[ 0, 2, 3, 4, 5], [ 0, 12, 13, 14, 15]])
In [223]: dt = np.dtype([('checksum','u2'), ('word', 'B', (3,))])
I know from past use, the genfromtxt
can handle relatively complex dtypes:
In [224]: np.savetxt('temp', x[:,1:], fmt='%d')
In [225]: cat temp
2 3 4 5
12 13 14 15
In [226]: data = np.genfromtxt('temp', dtype=dt)
In [227]: data
Out[227]:
array([( 2, [ 3, 4, 5]), (12, [13, 14, 15])],
dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])
But I haven't dug into its code to see how it maps the flat row data on to the dtypes.
But it turns out the unstructured_to_structured
that I mentioned in a comment can handle your dtype:
In [228]: import numpy.lib.recfunctions as rf
In [229]: rf.unstructured_to_structured(x[:,1:],dtype=dt)
Out[229]:
array([( 2, [ 3, 4, 5]), (12, [13, 14, 15])],
dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])
But for simpler dtype, I and others have often recommended turning the list of lists into a list of tuples.
In [230]: [tuple(row) for row in x[:,1:]]
Out[230]: [(2, 3, 4, 5), (12, 13, 14, 15)]
Many of the recfunctions
use a field-by-field copy
In [231]: res = np.zeros(2, dtype=dt)
In [232]: res
Out[232]:
array([(0, [0, 0, 0]), (0, [0, 0, 0])],
dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])
In [233]: res['checksum']= x[:,1]
In [234]: res['word']
Out[234]:
array([[0, 0, 0],
[0, 0, 0]], dtype=uint8)
In [235]: res['word'] = x[:,2:]
In [236]: res
Out[236]:
array([( 2, [ 3, 4, 5]), (12, [13, 14, 15])],
dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])
byte view
I missed the fact that you wanted to repack bytes. My above answer treats the input line as 4 numbers/ints that will be assigned to the 4 slots in the compound dtype. But with uint8
input, and u2
and u1
slots, you want to view
the 5 bytes with the new dtype, not make a new array.
In [332]: dt
Out[332]: dtype([('checksum', '<u2'), ('word', 'u1', (3,))])
In [333]: arr = np.array([(1,2,3,4,5),
...: (11,12,13,14,15)], dtype = np.uint8)
In [334]: arr.view(dt)
Out[334]:
array([[( 513, [ 3, 4, 5])],
[(3083, [13, 14, 15])]],
dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])
view
adds a dimension, that we need to remove:
In [335]: _.shape
Out[335]: (2, 1)
In [336]: arr.view(dt).reshape(2)
Out[336]:
array([( 513, [ 3, 4, 5]), (3083, [13, 14, 15])],
dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])
and changing the endedness of the u2
field:
In [337]: dt = np.dtype([('checksum','>u2'), ('word', 'B', (3,))])
In [338]: arr.view(dt).reshape(2)
Out[338]:
array([( 258, [ 3, 4, 5]), (2828, [13, 14, 15])],
dtype=[('checksum', '>u2'), ('word', 'u1', (3,))])