Home > Mobile >  Why is np.shape not showing all dimensions?
Why is np.shape not showing all dimensions?

Time:12-10

I have a pandas column storing a np array in each row. The df looks like this:

0    [38, 324, -21]
1    [41, 325, -19]
2    [41, 325, -19]
3    [42, 326, -20]
4    [42, 326, -19]

I want to convert this column into a np array so I can use it as training data for a model. I convert it to one np array with this:

arr = df.c.values

Now, I would except the shape of this array to be (5,3). However, when I run:

arr.shape

I get this:

(5,)

Further, if I run:

arr[0].shape

I get (3,).

Why don't I just get shape (5,3) when I run arr.shape?

CodePudding user response:

You can take a look at what df.c.values actually is by seeing what the output is:

import numpy as np
import pandas as pd

df = pd.DataFrame()
df['c'] = [np.random.randint(0, 10, 3) for i in range(5)]
In [2]: df
Out[2]:
    c
0   [-80, 4, -84]
1   [88, 32, 85]
2   [-11, 71, 37]
3   [-78, 93, 50]
4   [30, 29, 28]
In[3]: df.c.values
Out[3]: 
array([array([-80,   4, -84]), array([88, 32, 85]),
       array([-11,  71,  37]), array([-78,  93,  50]),
       array([30, 29, 28])], dtype=object)

So df.c.values is an 1 dimensional array containing 5 individual arrays (hence df.c.values.shape == (5,)), and not a 2d array.

To get a nd array you need to combine/stack them into one nd array. A straightforward way is to np.vstack() them:

arr = np.vstack(df.c.values)
arr.shape == (5,3)
  • Related