Extract an ndarray from a np.void array-CodePudding

the npy file I used ⬆️ https://github.com/mangomangomango0820/DataAnalysis/blob/master/NumPy/NumPyEx/NumPy_Ex1_3Dscatterplt.npy

2. after loading the npy file，

data = np.load('NumPy_Ex1_3Dscatterplt.npy')
'''
[([   2,    2, 1920,  480],) ([   1,    3, 1923,  480],)
 ......
 ([   3,    3, 1923,  480],)]
 
 
⬆️ data.shape, (69,)
⬆️ data.shape, (69,)
⬆️ data.dtype, [('f0', '<i8', (4,))]
⬆️ type(data), <class 'numpy.ndarray'>
⬆️ type(data[0]), <class 'numpy.void'>
'''

you can see for each row, e.g. data[0]，its type is <class 'numpy.void'>

I wish to get a ndarray based on the data above, looking like this ⬇️

[[   2    2 1920  480]
...
 [   3    3 1923  480]]

the way I did is ⬇️

all = np.array([data[i][0] for i in range(data.shape[0])])

'''
[[   2    2 1920  480]
...
 [   3    3 1923  480]]
'''

I am wondering if there's a smarter way to process the numpy.void class data and achieve the expected results.

CodePudding user response：

Here is the trick

data_clean = np.array(data.tolist())
print(data_clean)
print(data_clean.shape)

Output

[[[   2    2 1920  480]]

...............

 [[   3    3 1923  480]]]
(69, 1, 4)

In case if you dont like the extra 1 dimension in between, you can squeeze like this

data_sqz = data_clean.squeeze()
print(data_sqz)
print(data_sqz.shape)

Output

...
 [   3    3 1923  480]]
(69, 4)

CodePudding user response：

Your data is a structured array, with a compound dtype.

https://numpy.org/doc/stable/user/basics.rec.html

I can recreate it with:

In [130]: dt = np.dtype([("f0", "<i8", (4,))])
In [131]: x = np.array(
     ...:     [([2, 2, 1920, 480],), ([1, 3, 1923, 480],), ([3, 3, 1923, 480],)], dtype=dt
     ...: )
In [132]: x
Out[132]: 
array([([   2,    2, 1920,  480],), ([   1,    3, 1923,  480],),
       ([   3,    3, 1923,  480],)], dtype=[('f0', '<i8', (4,))])

This is 1d array onr field, and the field itself contains 4 elements.

Fields are accessed by name:

In [133]: x["f0"]
Out[133]: 
array([[   2,    2, 1920,  480],
       [   1,    3, 1923,  480],
       [   3,    3, 1923,  480]])

This has integer dtype with shape (3,4).

Accessing fields by name applies to more complex structured arrays as well.

Using the tolist approach from the other answer:

In [134]: x.tolist()
Out[134]: 
[(array([   2,    2, 1920,  480]),),
 (array([   1,    3, 1923,  480]),),
 (array([   3,    3, 1923,  480]),)]

In [135]: np.array(x.tolist())           # (3,1,4) shape
Out[135]: 
array([[[   2,    2, 1920,  480]],

       [[   1,    3, 1923,  480]],

       [[   3,    3, 1923,  480]]])
In [136]: np.vstack(x.tolist())          # (3,4) shape
Out[136]: 
array([[   2,    2, 1920,  480],
       [   1,    3, 1923,  480],
       [   3,    3, 1923,  480]])

The documentation also suggests using:

In [137]: import numpy.lib.recfunctions as rf
In [138]: rf.structured_to_unstructured(x)
Out[138]: 
array([[   2,    2, 1920,  480],
       [   1,    3, 1923,  480],
       [   3,    3, 1923,  480]])

An element of a structured array displays as a tuple, though the type is a generic np.void

There is an older class recarray, that is similar, but with an added way of accessing fields

In [146]: y=x.view(np.recarray)
In [147]: y
Out[147]: 
rec.array([([   2,    2, 1920,  480],), ([   1,    3, 1923,  480],),
           ([   3,    3, 1923,  480],)],
          dtype=[('f0', '<i8', (4,))])
In [148]: y.f0
Out[148]: 
array([[   2,    2, 1920,  480],
       [   1,    3, 1923,  480],
       [   3,    3, 1923,  480]])
In [149]: type(y[0])
Out[149]: numpy.record

I often refer to elements of structured arrays as records.