I have a pandas column storing a np array in each row. The df looks like this:
0 [38, 324, -21]
1 [41, 325, -19]
2 [41, 325, -19]
3 [42, 326, -20]
4 [42, 326, -19]
I want to convert this column into a np array so I can use it as training data for a model. I convert it to one np array with this:
arr = df.c.values
Now, I would except the shape of this array to be (5,3)
. However, when I run:
arr.shape
I get this:
(5,)
Further, if I run:
arr[0].shape
I get (3,)
.
Why don't I just get shape (5,3)
when I run arr.shape
?
CodePudding user response:
You can take a look at what df.c.values actually is by seeing what the output is:
import numpy as np
import pandas as pd
df = pd.DataFrame()
df['c'] = [np.random.randint(0, 10, 3) for i in range(5)]
In [2]: df
Out[2]:
c
0 [-80, 4, -84]
1 [88, 32, 85]
2 [-11, 71, 37]
3 [-78, 93, 50]
4 [30, 29, 28]
In[3]: df.c.values
Out[3]:
array([array([-80, 4, -84]), array([88, 32, 85]),
array([-11, 71, 37]), array([-78, 93, 50]),
array([30, 29, 28])], dtype=object)
So df.c.values
is an 1 dimensional array containing 5 individual arrays (hence df.c.values.shape == (5,)
), and not a 2d array.
To get a nd array you need to combine/stack them into one nd array. A straightforward way is to np.vstack()
them:
arr = np.vstack(df.c.values)
arr.shape == (5,3)