I'm trying to create a nested record array, but I am having trouble with the dimensions. I tried following the example at how to set dtype for nested numpy ndarray?, but I am misunderstanding something. Below is an MRE. The arrays are generated in a script, not imported from CSV.
arr1 = np.array([4, 5, 4, 5])
arr2 = np.array([0, 0, -1, -1])
arr3 = np.array([0.51, 0.89, 0.59, 0.94])
arr4 = np.array(
[[0.52, 0.80, 0.62, 1.1], [0.41, 0.71, 0.46, 0.77], [0.68, 1.12, 0.78, 1.19]]
).T
arr5 = np.repeat(np.array([0.6, 0.2, 0.2]), 4).reshape(3, 4).T
arrs = (arr1, arr2, arr3, arr4, arr5)
for i in arrs:
print(i.shape, i)
For which the print statement returns:
(4,) [4 5 4 5]
(4,) [ 0 0 -1 -1]
(4,) [0.51 0.89 0.59 0.94]
(4, 3) [[0.52 0.41 0.68]
[0.8 0.71 1.12]
[0.62 0.46 0.78]
[1.1 0.77 1.19]]
(4, 3) [[0.6 0.2 0.2]
[0.6 0.2 0.2]
[0.6 0.2 0.2]
[0.6 0.2 0.2]]
However, the ans
line throws an error:
dtypes = [
("state", "f8"),
("variability", "f8"),
("target", "f8"),
("measured", [("mean", "f8"), ("low", "f8"), ("hi", "f8")], (4,)),
("var", [("mid", "f8"), ("low", "f8"), ("hi", "f8")], (4,)),
]
ans = np.column_stack(arrs).view(dtype=dtypes)
ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.
Problem 1: How do I get the desired array output?
print(np.column_stack(arrs))
returns
[[ 4. 0. 0.51 0.52 0.41 0.68 0.6 0.2 0.2 ]
[ 5. 0. 0.89 0.8 0.71 1.12 0.6 0.2 0.2 ]
[ 4. -1. 0.59 0.62 0.46 0.78 0.6 0.2 0.2 ]
[ 5. -1. 0.94 1.1 0.77 1.19 0.6 0.2 0.2 ]]
But the desired output looks like this:
[[4 0 0.51 (0.52, 0.41, 0.68) (0.6, 0.2, 0.2)]
[5 -1 0.89 (0.8, 0.71, 1.12) (0.6, 0.2, 0.2)]
[4 0 0.59 (0.62, 0.46, 0.78) (0.6, 0.2, 0.2)]
[5 -1 0.94 (1.1, 0.77, 1.19) (0.6, 0.2, 0.2)]]
Problem 2: How do I include the dtype.names?
print(rec_array.dtype.names)
should return:
('state', 'variability', 'target', 'measured', 'var')
and print(rec_array['measured'].dtype.names)
should return:
('mean', 'low', 'high')
and similarly for the names of the other nested array.
CodePudding user response:
With your dtype:
In [2]: dtypes = [
...: ("state", "f8"),
...: ("variability", "f8"),
...: ("target", "f8"),
...: ("measured", [("mean", "f8"), ("low", "f8"), ("hi", "f8")], (4,)),
...: ("var", [("mid", "f8"), ("low", "f8"), ("hi", "f8")], (4,)),
...: ]
A 2 element zeros array looks like:
In [3]: arr = np.zeros(2,dtypes)
In [4]: arr
Out[4]:
array([(0., 0., 0., [(0., 0., 0.), (0., 0., 0.), (0., 0., 0.), (0., 0., 0.)], [(0., 0., 0.), (0., 0., 0.), (0., 0., 0.), (0., 0., 0.)]),
(0., 0., 0., [(0., 0., 0.), (0., 0., 0.), (0., 0., 0.), (0., 0., 0.)], [(0., 0., 0.), (0., 0., 0.), (0., 0., 0.), (0., 0., 0.)])],
dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')], (4,)), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')], (4,))])
Using recfunctions
I can map that to a unstructured array:
In [5]: import numpy.lib.recfunctions as rf
In [6]: uarr = rf.structured_to_unstructured(arr)
In [7]: uarr
Out[7]:
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
In [8]: uarr.shape
Out[8]: (2, 27)
That says that your dtypes has 27 fields, not the 9 that seem to think (from your column stack).
Making a new (2,27) array, I can create a structured array:
In [9]: uarr = np.arange(2*27).reshape(2,27)
In [18]: rf.unstructured_to_structured(uarr, dtype=np.dtype(dtypes))
Out[18]:
array([( 0., 1., 2., [( 3., 4., 5.), ( 6., 7., 8.), ( 9., 10., 11.), (12., 13., 14.)], [(15., 16., 17.), (18., 19., 20.), (21., 22., 23.), (24., 25., 26.)]),
(27., 28., 29., [(30., 31., 32.), (33., 34., 35.), (36., 37., 38.), (39., 40., 41.)], [(42., 43., 44.), (45., 46., 47.), (48., 49., 50.), (51., 52., 53.)])],
dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')], (4,)), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')], (4,))])
view
still has problems with this. In some simple cases view
does work, though it can require some dimensions adjustment. But I have not explored its limitations:
In [19]: uarr.view(np.dtype(dtypes))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [19], in <cell line: 1>()
----> 1 uarr.view(np.dtype(dtypes))
ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.
edit
removing the (4,) from dtypes:
In [35]: dtypes = [
...: ("state", "f8"),
...: ("variability", "f8"),
...: ("target", "f8"),
...: ("measured", [("mean", "f8"), ("low", "f8"), ("hi", "f8")]),
...: ("var", [("mid", "f8"), ("low", "f8"), ("hi", "f8")]),
...: ]
In [36]: arr = np.zeros(2,dtypes)
In [37]: arr
Out[37]:
array([(0., 0., 0., (0., 0., 0.), (0., 0., 0.)),
(0., 0., 0., (0., 0., 0.), (0., 0., 0.))],
dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')]), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')])])
In [38]: uarr = np.arange(18).reshape(2,9)
In [39]: arr1 = rf.unstructured_to_structured(uarr, dtype=np.dtype(dtypes))
In [40]: arr1
Out[40]:
array([(0., 1., 2., ( 3., 4., 5.), ( 6., 7., 8.)),
(9., 10., 11., (12., 13., 14.), (15., 16., 17.))],
dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')]), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')])])
In [43]: arr1['measured']
Out[43]:
array([( 3., 4., 5.), (12., 13., 14.)],
dtype=[('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')])
In [44]: arr1['measured']['mean']
Out[44]: array([ 3., 12.])
and via a csv and genfromtxt
In [45]: np.savetxt('foo', uarr)
In [46]: more foo
0.000000000000000000e 00 1.000000000000000000e 00 2.000000000000000000e 00 3.000000000000000000e 00 4.000000000000000000e 00 5.000000000000000000e 00 6.000000000000000000e 00 7.000000000000000000e 00 8.000000000000000000e 00
9.000000000000000000e 00 1.000000000000000000e 01 1.100000000000000000e 01 1.200000000000000000e 01 1.300000000000000000e 01 1.400000000000000000e 01 1.500000000000000000e 01 1.600000000000000000e 01 1.700000000000000000e 01
In [47]: data = np.genfromtxt('foo', dtype=dtypes)
In [48]: data
Out[48]:
array([(0., 1., 2., ( 3., 4., 5.), ( 6., 7., 8.)),
(9., 10., 11., (12., 13., 14.), (15., 16., 17.))],
dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')]), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')])])
view
still does not work.
with your data
In [50]: arr1 = np.array([4, 5, 4, 5])
...: arr2 = np.array([0, 0, -1, -1])
...: arr3 = np.array([0.51, 0.89, 0.59, 0.94])
...: arr4 = np.array(
...: [[0.52, 0.80, 0.62, 1.1], [0.41, 0.71, 0.46, 0.77], [0.68, 1.12, 0.78, 1.19]]
...: ).T
...: arr5 = np.repeat(np.array([0.6, 0.2, 0.2]), 4).reshape(3, 4).T
...: arrs = (arr1, arr2, arr3, arr4, arr5)
In [51]: ans = np.column_stack(arrs)
In [52]: ans
Out[52]:
array([[ 4. , 0. , 0.51, 0.52, 0.41, 0.68, 0.6 , 0.2 , 0.2 ],
[ 5. , 0. , 0.89, 0.8 , 0.71, 1.12, 0.6 , 0.2 , 0.2 ],
[ 4. , -1. , 0.59, 0.62, 0.46, 0.78, 0.6 , 0.2 , 0.2 ],
[ 5. , -1. , 0.94, 1.1 , 0.77, 1.19, 0.6 , 0.2 , 0.2 ]])
In [53]: arr2 = rf.unstructured_to_structured(ans, dtype=np.dtype(dtypes))
In [54]: arr2
Out[54]:
array([(4., 0., 0.51, (0.52, 0.41, 0.68), (0.6, 0.2, 0.2)),
(5., 0., 0.89, (0.8 , 0.71, 1.12), (0.6, 0.2, 0.2)),
(4., -1., 0.59, (0.62, 0.46, 0.78), (0.6, 0.2, 0.2)),
(5., -1., 0.94, (1.1 , 0.77, 1.19), (0.6, 0.2, 0.2))],
dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')]), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')])])