Column stacking nested numpy structure array, help getting dims right-CodePudding

I'm trying to create a nested record array, but I am having trouble with the dimensions. I tried following the example at how to set dtype for nested numpy ndarray?, but I am misunderstanding something. Below is an MRE. The arrays are generated in a script, not imported from CSV.

arr1 = np.array([4, 5, 4, 5])
arr2 = np.array([0, 0, -1, -1])
arr3 = np.array([0.51, 0.89, 0.59, 0.94])
arr4 = np.array(
    [[0.52, 0.80, 0.62, 1.1], [0.41, 0.71, 0.46, 0.77], [0.68, 1.12, 0.78, 1.19]]
).T
arr5 = np.repeat(np.array([0.6, 0.2, 0.2]), 4).reshape(3, 4).T
arrs = (arr1, arr2, arr3, arr4, arr5)

for i in arrs:
    print(i.shape, i)

For which the print statement returns:

(4,) [4 5 4 5]
(4,) [ 0  0 -1 -1]
(4,) [0.51 0.89 0.59 0.94]
(4, 3) [[0.52 0.41 0.68]
 [0.8  0.71 1.12]
 [0.62 0.46 0.78]
 [1.1  0.77 1.19]]
(4, 3) [[0.6 0.2 0.2]
 [0.6 0.2 0.2]
 [0.6 0.2 0.2]
 [0.6 0.2 0.2]]

However, the ans line throws an error:

dtypes = [
        ("state", "f8"),
        ("variability", "f8"),
        ("target", "f8"),
        ("measured", [("mean", "f8"), ("low", "f8"), ("hi", "f8")], (4,)),
        ("var", [("mid", "f8"), ("low", "f8"), ("hi", "f8")], (4,)),
]
ans = np.column_stack(arrs).view(dtype=dtypes)

ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.

Problem 1: How do I get the desired array output? print(np.column_stack(arrs)) returns

[[ 4.    0.    0.51  0.52  0.41  0.68  0.6   0.2   0.2 ]
 [ 5.    0.    0.89  0.8   0.71  1.12  0.6   0.2   0.2 ]
 [ 4.   -1.    0.59  0.62  0.46  0.78  0.6   0.2   0.2 ]
 [ 5.   -1.    0.94  1.1   0.77  1.19  0.6   0.2   0.2 ]]

But the desired output looks like this:

[[4 0 0.51 (0.52, 0.41, 0.68) (0.6, 0.2, 0.2)]
 [5 -1 0.89 (0.8, 0.71, 1.12) (0.6, 0.2, 0.2)]
 [4 0 0.59 (0.62, 0.46, 0.78) (0.6, 0.2, 0.2)]
 [5 -1 0.94 (1.1, 0.77, 1.19) (0.6, 0.2, 0.2)]]

Problem 2: How do I include the dtype.names?

print(rec_array.dtype.names) should return: ('state', 'variability', 'target', 'measured', 'var')

and print(rec_array['measured'].dtype.names) should return: ('mean', 'low', 'high')

and similarly for the names of the other nested array.

CodePudding user response：

With your dtype:

In [2]: dtypes = [
   ...:         ("state", "f8"),
   ...:         ("variability", "f8"),
   ...:         ("target", "f8"),
   ...:         ("measured", [("mean", "f8"), ("low", "f8"), ("hi", "f8")], (4,)),
   ...:         ("var", [("mid", "f8"), ("low", "f8"), ("hi", "f8")], (4,)),
   ...: ]

A 2 element zeros array looks like:

In [3]: arr = np.zeros(2,dtypes)    
In [4]: arr
Out[4]: 
array([(0., 0., 0., [(0., 0., 0.), (0., 0., 0.), (0., 0., 0.), (0., 0., 0.)], [(0., 0., 0.), (0., 0., 0.), (0., 0., 0.), (0., 0., 0.)]),
       (0., 0., 0., [(0., 0., 0.), (0., 0., 0.), (0., 0., 0.), (0., 0., 0.)], [(0., 0., 0.), (0., 0., 0.), (0., 0., 0.), (0., 0., 0.)])],
      dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')], (4,)), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')], (4,))])

Using recfunctions I can map that to a unstructured array:

In [5]: import numpy.lib.recfunctions as rf    
In [6]: uarr = rf.structured_to_unstructured(arr)    
In [7]: uarr
Out[7]: 
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])    
In [8]: uarr.shape
Out[8]: (2, 27)

That says that your dtypes has 27 fields, not the 9 that seem to think (from your column stack).

Making a new (2,27) array, I can create a structured array:

In [9]: uarr = np.arange(2*27).reshape(2,27)
In [18]: rf.unstructured_to_structured(uarr, dtype=np.dtype(dtypes))
Out[18]: 
array([( 0.,  1.,  2., [( 3.,  4.,  5.), ( 6.,  7.,  8.), ( 9., 10., 11.), (12., 13., 14.)], [(15., 16., 17.), (18., 19., 20.), (21., 22., 23.), (24., 25., 26.)]),
       (27., 28., 29., [(30., 31., 32.), (33., 34., 35.), (36., 37., 38.), (39., 40., 41.)], [(42., 43., 44.), (45., 46., 47.), (48., 49., 50.), (51., 52., 53.)])],
      dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')], (4,)), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')], (4,))])

view still has problems with this. In some simple cases view does work, though it can require some dimensions adjustment. But I have not explored its limitations:

In [19]: uarr.view(np.dtype(dtypes))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [19], in <cell line: 1>()
----> 1 uarr.view(np.dtype(dtypes))

ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.

edit

removing the (4,) from dtypes:

In [35]: dtypes = [
    ...:         ("state", "f8"),
    ...:         ("variability", "f8"),
    ...:         ("target", "f8"),
    ...:         ("measured", [("mean", "f8"), ("low", "f8"), ("hi", "f8")]),
    ...:         ("var", [("mid", "f8"), ("low", "f8"), ("hi", "f8")]),
    ...: ]

In [36]: arr = np.zeros(2,dtypes)

In [37]: arr
Out[37]: 
array([(0., 0., 0., (0., 0., 0.), (0., 0., 0.)),
       (0., 0., 0., (0., 0., 0.), (0., 0., 0.))],
      dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')]), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')])])

In [38]: uarr = np.arange(18).reshape(2,9)

In [39]: arr1 = rf.unstructured_to_structured(uarr, dtype=np.dtype(dtypes))

In [40]: arr1
Out[40]: 
array([(0.,  1.,  2., ( 3.,  4.,  5.), ( 6.,  7.,  8.)),
       (9., 10., 11., (12., 13., 14.), (15., 16., 17.))],
      dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')]), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')])])

In [43]: arr1['measured']
Out[43]: 
array([( 3.,  4.,  5.), (12., 13., 14.)],
      dtype=[('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')])

In [44]: arr1['measured']['mean']
Out[44]: array([ 3., 12.])

and via a csv and genfromtxt

In [45]: np.savetxt('foo', uarr)

In [46]: more foo
0.000000000000000000e 00 1.000000000000000000e 00 2.000000000000000000e 00 3.000000000000000000e 00 4.000000000000000000e 00 5.000000000000000000e 00 6.000000000000000000e 00 7.000000000000000000e 00 8.000000000000000000e 00
9.000000000000000000e 00 1.000000000000000000e 01 1.100000000000000000e 01 1.200000000000000000e 01 1.300000000000000000e 01 1.400000000000000000e 01 1.500000000000000000e 01 1.600000000000000000e 01 1.700000000000000000e 01

In [47]: data = np.genfromtxt('foo', dtype=dtypes)

In [48]: data
Out[48]: 
array([(0.,  1.,  2., ( 3.,  4.,  5.), ( 6.,  7.,  8.)),
       (9., 10., 11., (12., 13., 14.), (15., 16., 17.))],
      dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')]), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')])])

view still does not work.

with your data

In [50]: arr1 = np.array([4, 5, 4, 5])
    ...: arr2 = np.array([0, 0, -1, -1])
    ...: arr3 = np.array([0.51, 0.89, 0.59, 0.94])
    ...: arr4 = np.array(
    ...:     [[0.52, 0.80, 0.62, 1.1], [0.41, 0.71, 0.46, 0.77], [0.68, 1.12, 0.78, 1.19]]
    ...: ).T
    ...: arr5 = np.repeat(np.array([0.6, 0.2, 0.2]), 4).reshape(3, 4).T
    ...: arrs = (arr1, arr2, arr3, arr4, arr5)

In [51]: ans = np.column_stack(arrs)

In [52]: ans
Out[52]: 
array([[ 4.  ,  0.  ,  0.51,  0.52,  0.41,  0.68,  0.6 ,  0.2 ,  0.2 ],
       [ 5.  ,  0.  ,  0.89,  0.8 ,  0.71,  1.12,  0.6 ,  0.2 ,  0.2 ],
       [ 4.  , -1.  ,  0.59,  0.62,  0.46,  0.78,  0.6 ,  0.2 ,  0.2 ],
       [ 5.  , -1.  ,  0.94,  1.1 ,  0.77,  1.19,  0.6 ,  0.2 ,  0.2 ]])

In [53]: arr2 = rf.unstructured_to_structured(ans, dtype=np.dtype(dtypes))

In [54]: arr2
Out[54]: 
array([(4.,  0., 0.51, (0.52, 0.41, 0.68), (0.6, 0.2, 0.2)),
       (5.,  0., 0.89, (0.8 , 0.71, 1.12), (0.6, 0.2, 0.2)),
       (4., -1., 0.59, (0.62, 0.46, 0.78), (0.6, 0.2, 0.2)),
       (5., -1., 0.94, (1.1 , 0.77, 1.19), (0.6, 0.2, 0.2))],
      dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')]), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')])])