Home > Blockchain >  How to convert my array obtained from C to a 2D numpy array in Python without for loop
How to convert my array obtained from C to a 2D numpy array in Python without for loop

Time:04-15

I used numpy C api in C and got the following array in python:

>>> my_array
array([array([20211101., 20211101., 20211101., 20211101., 20211101.]),
       array([10601155, 10603088, 10603982, 10600983, 10603283], dtype=int32),
       array([30000011, 30000021, 30000031, 30000041, 30000051], dtype=int32),
       array([93003000., 93003000., 93003000., 93003000., 93003000.]),
       array([-1., -1., -1.,  1., -1.]),
       array([b'Sell', b'Sell', b'Sell', b'Buy', b'Sell'], dtype='|S4'),
       array([b'SQZ', b'SQZ', b'SQZ', b'SQZ', b'SQZ'], dtype='|S4'),
       array([ 100, 1100,  100,  200,  200], dtype=int32),
       array([34.19,  9.97, 29.46,  8.96, 27.85]),
      array([b'5', b'0', b'5', b'0', b'0'], dtype='|S4')], dtype=object)

The shape of this array is

>>> my_array.shape
(10,)

My purpose is to switch this array to a 2D numpy array and create a dataframe by pd.DataFrame(data=my_array). But I failed to do it because I am supposed to input some numpy array like

np.array([[...],[...],[...],...])

not

array([array([...]),array([...]),array([...]),...])

I understand that I can use a for loop to get the dataframe, but the speed would be very slow if the dataset is large. So is there any method to convert my array to a real 2D numpy array and get a dataframe object?

CodePudding user response:

Making a list from your sample:

In [132]: alist
Out[132]: 
[array([20211101., 20211101., 20211101., 20211101., 20211101.]),
 array([10601155, 10603088, 10603982, 10600983, 10603283], dtype=int32),
 array([30000011, 30000021, 30000031, 30000041, 30000051], dtype=int32),
 array([93003000., 93003000., 93003000., 93003000., 93003000.]),
 array([-1., -1., -1.,  1., -1.]),
 array([b'Sell', b'Sell', b'Sell', b'Buy', b'Sell'], dtype='|S4'),
 array([b'SQZ', b'SQZ', b'SQZ', b'SQZ', b'SQZ'], dtype='|S4'),
 array([ 100, 1100,  100,  200,  200], dtype=int32),
 array([34.19,  9.97, 29.46,  8.96, 27.85]),
 array([b'5', b'0', b'5', b'0', b'0'], dtype='|S4')]

Using 'list transpose' to make a list of tuples, one per "row/record" of the frame:

In [133]: df = pd.DataFrame([tuple(x) for x in zip(*alist)])
In [134]: df
Out[134]: 
            0         1         2           3  ...       6     7      8     9
0  20211101.0  10601155  30000011  93003000.0  ...  b'SQZ'   100  34.19  b'5'
1  20211101.0  10603088  30000021  93003000.0  ...  b'SQZ'  1100   9.97  b'0'
2  20211101.0  10603982  30000031  93003000.0  ...  b'SQZ'   100  29.46  b'5'
3  20211101.0  10600983  30000041  93003000.0  ...  b'SQZ'   200   8.96  b'0'
4  20211101.0  10603283  30000051  93003000.0  ...  b'SQZ'   200  27.85  b'0'

[5 rows x 10 columns]

Since the subarrays are all the same length, making an object array from it requires some special handling. We can't just copy-n-paste your display.

In [135]: arr = np.zeros(len(alist),object)
In [136]: arr[:] = alist

This makes a 1d array like yours, which will work as with the list

In [138]: df = pd.DataFrame([tuple(x) for x in zip(*arr)])

pandas may have another way of creating a frame with one column/series per array of a list, but this is best I can do from a numpy base.

  • Related