I used numpy C api
in C and got the following array in python:
>>> my_array
array([array([20211101., 20211101., 20211101., 20211101., 20211101.]),
array([10601155, 10603088, 10603982, 10600983, 10603283], dtype=int32),
array([30000011, 30000021, 30000031, 30000041, 30000051], dtype=int32),
array([93003000., 93003000., 93003000., 93003000., 93003000.]),
array([-1., -1., -1., 1., -1.]),
array([b'Sell', b'Sell', b'Sell', b'Buy', b'Sell'], dtype='|S4'),
array([b'SQZ', b'SQZ', b'SQZ', b'SQZ', b'SQZ'], dtype='|S4'),
array([ 100, 1100, 100, 200, 200], dtype=int32),
array([34.19, 9.97, 29.46, 8.96, 27.85]),
array([b'5', b'0', b'5', b'0', b'0'], dtype='|S4')], dtype=object)
The shape of this array is
>>> my_array.shape
(10,)
My purpose is to switch this array to a 2D numpy array and create a dataframe by pd.DataFrame(data=my_array)
. But I failed to do it because I am supposed to input some numpy array like
np.array([[...],[...],[...],...])
not
array([array([...]),array([...]),array([...]),...])
I understand that I can use a for loop to get the dataframe, but the speed would be very slow if the dataset is large. So is there any method to convert my array to a real 2D numpy array and get a dataframe object?
CodePudding user response:
Making a list from your sample:
In [132]: alist
Out[132]:
[array([20211101., 20211101., 20211101., 20211101., 20211101.]),
array([10601155, 10603088, 10603982, 10600983, 10603283], dtype=int32),
array([30000011, 30000021, 30000031, 30000041, 30000051], dtype=int32),
array([93003000., 93003000., 93003000., 93003000., 93003000.]),
array([-1., -1., -1., 1., -1.]),
array([b'Sell', b'Sell', b'Sell', b'Buy', b'Sell'], dtype='|S4'),
array([b'SQZ', b'SQZ', b'SQZ', b'SQZ', b'SQZ'], dtype='|S4'),
array([ 100, 1100, 100, 200, 200], dtype=int32),
array([34.19, 9.97, 29.46, 8.96, 27.85]),
array([b'5', b'0', b'5', b'0', b'0'], dtype='|S4')]
Using 'list transpose' to make a list of tuples, one per "row/record" of the frame:
In [133]: df = pd.DataFrame([tuple(x) for x in zip(*alist)])
In [134]: df
Out[134]:
0 1 2 3 ... 6 7 8 9
0 20211101.0 10601155 30000011 93003000.0 ... b'SQZ' 100 34.19 b'5'
1 20211101.0 10603088 30000021 93003000.0 ... b'SQZ' 1100 9.97 b'0'
2 20211101.0 10603982 30000031 93003000.0 ... b'SQZ' 100 29.46 b'5'
3 20211101.0 10600983 30000041 93003000.0 ... b'SQZ' 200 8.96 b'0'
4 20211101.0 10603283 30000051 93003000.0 ... b'SQZ' 200 27.85 b'0'
[5 rows x 10 columns]
Since the subarrays are all the same length, making an object array from it requires some special handling. We can't just copy-n-paste your display.
In [135]: arr = np.zeros(len(alist),object)
In [136]: arr[:] = alist
This makes a 1d array like yours, which will work as with the list
In [138]: df = pd.DataFrame([tuple(x) for x in zip(*arr)])
pandas
may have another way of creating a frame with one column/series per array of a list, but this is best I can do from a numpy
base.